We had a issue for groupby query.
We are using Kafka to input data and during select * from datasource, we can found newest data from kafka.
But, if we do groupby, this data need to wait about 20 mins to show.
Looks like druid need to get indexed.
we are not sure is that suppose like that to wait druid get it indexed or are there some config we can do to make all data live without delay?
Please advise if anyone had the same issue.
Are you using a time bound or a query granularity on the groupby?
That is very odd, @honeymoose , as the data is coming from the same place.
GROUP BY operates on the same data. As @Rachel_Pedreschi says - are you selecting different time periods?
What is indicating to you that the data is 20 minutes late?
Thanks @petermarshallio and @Rachel_Pedreschi
I think we found reason for this by adjust the query filter.
This is one of our filter:
We sending data by Kafka, in the kafka data we don’t have value for Col “Error”.
And in during settings, we didn’t set default value for Error as well.
After Druid took data from kafka, Druid will set value for col “Error” as 0.
But looks like this 0 cannot search for it. (I am not sure why is like this).
The solution for this issue will be:
- set Value for Col “Error” during message.
- set default value for col “Error” if message don’t have it (I don’t know how to do this yet, if @Rachel_Pedreschi or @petermarshallio can help that).
Or maybe I missed something else ?
Ooooh I wonder if it’s sometimes
Maybe this will help explain? Maybe?!