Doubt in Group by queries

Hi,

As per my use case I need to generate reports for count of users grouped by the each activity and for each day for the last month. If this was an SQL query it would be as mentioned below

select activity,day,count(Distinct userIdentifier) from TestDruid where hierarchy1 = ‘sample_value1’ and hierarchy2 = ‘sample_value2’ and (activityType = ‘Type1’ or activityType = ‘Type2’) groupBy activity,day order by 3

and the druid query I have formed is

{ “queryType”: “groupBy”,

“dataSource”: “TestDruid”,

“dimensions”: [“activity”, “users”],

“limitSpec”: { “type”: “default”, “columns”: [“activity”] },

“granularity”:{“type”: “period”, “period”: “P1D”, “timeZone”: “America/Los_Angeles”, “origin”: “2012-03-20T00:00:00-08:00”}

“filter”: {

“type”: “and”,

“fields”: [

{ “type”: “selector”, “dimension”: “hierarchy1”, “value”: “sample_value1” },

{ “type”: “selector”, “dimension”: “hierarchy2”, “value”: “sample_value2” },

{ “type”: “or”,

“fields”: [

{ “type”: “selector”, “dimension”: “activityType”, “value”: “Type1” },

{ “type”: “selector”, “dimension”: “activityType”, “value”: “type2” }

]

}

]

},

“intervals”: [ “2012-03-19T00:00:00-00:00/2012-04-20T00:00:00.000” ],

“aggregations”: [

{ “type”: “cardinality”, “name”: “users”,“fieldNames”: “userIdentifier”,“byRow”: false}

],

}

  1. When I hit this query I get an empty data but with basic queries mentioned below I am getting the response

curl -XPOST -H’Content-type: application/json’ “http://localhost:8080/druid/v2/?pretty” -d’{“queryType”:“groupBy”,“dataSource”:“TestDruid”,“granularity”:{“type”:“period”,“period”:“PT1m”},“dimensions”:,“aggregations”:[{“type”:“count”,“name”:“rows”}],“intervals”:[“2013-01-01T00:00:00.000/2015-05-03T00:00:00.000”]}’

curl -XPOST -H’Content-type: application/json’ “http://localhost:8080/druid/v2/?pretty” -d’{“queryTypeselect”,“dataSource”:“TestDruid”,“dimensions”:,“metrics”:, “granularity”: “all”,“intervals”:[“2013-01-01T00:00:00.000/2015-05-03T00:00:00.000”],“pagingSpec”:{“pagingIdentifiers”: {}, “threshold”:5}}

is my query wrong? Am asking this specifically based on the query syntax and placing of each query parameters.

2 . How to specify the dimensions on which group-by should be performed… (I understand granularity is actually doing grouping at day level.)

  1. Since I am mentioning time zone in granularity. I assume it group the data based on the respective time zone. So should I insert all the data in GMT.

  2. Should the origin and the start value of interval be same. and if there is no data from that time will it throw any errors.

Thank you in advance

1) The biggest difference I can see between the groupBy you provided
and the ones that you say give you results is that the ones that give
you results all specify a larger interval and the groupBy has a
filter. Try setting the granularity to "all" and setting the interval
the same as those. Do you get results then? If not, try removing the
filter, do you get results then?

2) The dimension you are grouping by (other than time) is "activity"
so your dimensions array should just be activity. Also, you aren't
specifying a limit clause, so you can likely remove the limitSpec.

3) The granularity you provided will give you days according to PST,
the data that you ingested has likely already been adjusted to UTC but
if your input data used string timestamps and didn't specify a
timezone, that might not be true.

4) origin and start value do not need to be (and probably shouldn't be
in the vast majority of cases) the same. The origin is telling it how
to compute the time granularity. Imagine if you said you wanted 3 day
granularity, where do 3 days start and end? The origin answers that
question. If there is no data for the time period specified, the
query will "succeed" and you will get nothing back. It's essentially
the same as if you had an empty table in a relational database and did
a query that didn't match any rows.

--Eric

Eric,

Thank you for clearing my doubts. Regarding the query the interval was wrong, once that was corrected I started getting the response. Its working fine now.

-Balakrishnan