Nested groupby query not working as expected

Hi Experts, I’m new to Druid. Recently I want to use Druid to do a frequency statistics like following:

my raw data like: {“user”:“user1”, “date”:“2016/12/1”, “viewcount”: 2}, {“user”:“user1”, “date”:“2016/12/1”, “viewcount”:3}, {“user”:“user2”, “date”:“2016/12/2”, “viewcount”: 3}.

my requirement is to build a global user daily viewcount frequency baseline based on the raw data, which means the frequency of a daily viewcount of an individual user. Base on the above raw data example, the expected output should be:

{“viewcount”:5, “frequency”: 1, “viewcount”:3, “frequency”:1}, which means, it occurs once that the daily viewcount of a single user is 5, and it also occurs once that the daily viewcount of a single user is 3.

I tried to use Druid nested groupby query to get the expected result. The query statement is like below:

{

“intervals”: [

“1900-01-01/2100-01-01”

],

“queryType”: “groupBy”,

“granularity”: “all”,

“dimensions”: [

“dailyviewcount”

],

“aggregations”: [

{

“type”: “count”,

“name”: “frequency”

}

],

“dataSource”: {

“type”: “query”,

“query”: {

“intervals”: [

“1900-01-01/2100-01-01”

],

“granularity”: “all”,

“context”: {

“timeout”: 60000

},

“queryType”: “groupBy”,

“dataSource”: “myrawdata”,

“aggregations”: [

{

“type”: “doubleSum”,

“name”: “dailyviewcount”,

“fieldName”:“viewcount”

}

],

“dimensions”: [

“user.id”,

“date”

]

}

}

}

The inner group query can return the correct temp result like:

user.id, timestamp, dailyviewcount

user1, 2016/12/1, 5

user2, 2016/12/2, 3

But the final result is not as expected:

dailyviewcount, result

‘’, 8

What I expects is:

dailyviewcount, result

5, 1

3, 1

Could any one help me with this issue? B.T.W, I also use MySQL to do the same query, and MySQL can return the expected result to me. SQL statement like below which works well:

select temp.dailyviewcount, count(*) as frequency from

(select user,sum(viewcount) as dailyviewcount, date from myrawdata group by user, date) as temp

group by temp.dailyviewcount

Not sure anything I need add to the query statement in Druid for this case.

thanks

xuanbei

Hey Xuanbei,

This looks like it should work. Does it work if you use the v2 engine? It’s able to do some kinds of nested queries that v1 can’t. Whether or not it works there, could you please raise a GitHub issue (a test case would be great too, in GroupByQueryRunnerTest)?

If you want to try the v2 engine, our docs are here: http://druid.io/docs/latest/querying/groupbyquery.html. And there’s a good writeup about how to enable it here: https://raygun.com/blog/2016/12/druid-groupby-v2-engine/

Hi Gian,

It works when using v2 engine. Thank you!

在 2016年12月23日星期五 UTC+8下午11:09:47,Gian Merlino写道: