Inconsistent group by result on historical node?

Hi guys,
I got a wired problem: there is a hourly segment with about 3m records(the number comes from an count timeseries query), when I issue with a group by query with count metric and dimension a,b, the result contains many rows, one of which is "a1,b1,5", the 3 columns are result of a,b,count.
When I issue another group by query with same metric, but dimension a,b,c, one of the result rows is “a1,b1,c1,6”, that is the result of the a,b,c,count
The select query result shows there are only 5 rows with a=a1 and b=b1 and c=c1, has anyone encounterd this?

The 5 and 6 is real numbers, so it’s probably not the Float Number Issue, and queries are directly send to the same historical node, the behaviors are the same for druid 0.6.160 and druid 0.7.1.1 …

I’m step debugging @ 0.6.160 to see what the hell is going on, but the conditional break point make it really slow…

Any thoughts?

Hi Weinan,

How are you issuing the select query, is it with pagination? My second question is, are any of these dimensions multi-value dimensions?

Hi Fangjin,

Yes I run the select query with pagination(threshold=100m), the group by query without pagination.

This dimension aren’t multi-value dimensions, actually they are avro records, so the schema is fixed…

Still doing my step debugging…

在 2015年5月26日星期二 UTC+8上午1:16:45,Fangjin Yang写道:

Hi Fangjin,

Finally I got some time to work it out, pls check whether am I correct @pr: https://github.com/druid-io/druid/pull/1406

Bottom line is my problem has gone…

在 2015年5月29日星期五 UTC+8下午3:36:09,Zhao Weinan写道:

Hi Fangjin,

Finally I got some time to work it out, pls check whether am I correct @pr: https://github.com/druid-io/druid/pull/1406

Bottom line is my problem has gone…

在 2015年5月29日星期五 UTC+8下午3:36:09,Zhao Weinan写道:

Thanks for the contrib Weinan! Is there any way you can write a unit test to reproduce the issue? GroupByQueryRunnerTest.java is where the groupby query tests live.