How to get Totals and sub-totals

Hello Druid Gurus,

i was wondering if there is a way to get or enable totals from druid query result sets. if yes do we have to include this logic in ‘postAggregations’. Please let me know.

Regards

Karteek

hi,
not sure what you meant. But in general, you can use longSum/doubleSum aggregators for sum. optionally, post aggregators could be used to sum values from multiple columns (in the same row of resultset)

pls see
http://druid.io/docs/0.8.0/querying/aggregations.html
http://druid.io/docs/0.8.0/querying/post-aggregations.html

– Himanshu

– Himanshu

Himanshu,

i was actually looking into possibility of aggregating values from multiple Rows.

karteek

hmmm… may be a nested-groupby query can help you. when doing a group-by query, it is totally possible to have another group-by query in the “dataSource” field of the query. In general, aggregators/post-aggregators do not combine multiple rows.

if that does not help then, can you describe your usecase a little? What does the data look like, what spec did you use to index that data into druid, what is the query and what do you want to get?

– Himanshu

Thanks for quick reply Himanshu. i will try nested group by and try it out. basically what i am trying to achive is haveing a column which give me %share metric to what each row contribution is towards total value.

karteek

Himashu,

i executed it with nested group by and but i get following error. i cant find anything related to querySegmentSpec in druid docs.

“error” : “Instantiation of [simple type, class io.druid.query.groupby.GroupByQuery] value failed: querySegmentSpec can’t be null”

below is my query

{

“queryType”: “groupBy”,

“dataSource”:

{

“type”: “query”,

“query”: {

“queryType”: “groupBy”,

“dataSource”: “QBRDataHdp”,

“granularity”: “all”,

“dimensions”: [

“transaction_reference_date”

],

“filter”: {

“type”: “and”,

“fields”: [

{

“type”: “selector”,

“dimension”: “time_of_day”,

“value”: 12

},

{

“type”: “selector”,

“dimension”: “transaction_currency”,

“value”: “USD”

}

]

},

“aggregations”: [

{

“type”: “doubleSum”,

“fieldName”: “NTPV_USD_AMT”,

“name”: “NTPV_USD_AMT”

},

{

“type”: “doubleSum”,

“fieldName”: “NET_CNT”,

“name”: “NET_CNT”

},

{

“type”: “doubleSum”,

“fieldName”: “SUCCESS_CNT”,

“name”: “SUCCESS_CNT”

}

]

},

“intervals”: [

“2015-01-01/2015-04-01”

]

},

“granularity”: “all”,

“dimensions”: ,

“aggregations”: [

{

“type”: “doubleSum”,

“fieldName”: “NTPV_USD_AMT”,

“name”: “TOT_NTPV_USD_AMT”

},

{

“type”: “doubleSum”,

“fieldName”: “NET_CNT”,

“name”: “TOT_NET_CNT”

},

{

“type”: “doubleSum”,

“fieldName”: “SUCCESS_CNT”,

“name”: “TOT_SUCCESS_CNT”

}

],

“intervals”: [

“2015-01-01/2015-04-01”

]

}

Karteek

Karteek, I think the “intervals” on your inner query is in the wrong spot. It should be one level deeper.

Thanks a lot Gian & Himanshu , Nested query worked after moving intervals one level up.

However i have one additional request that i would like the query to do. I would also like query to return data for inner query. Is it possible ? currently it is displaying aggregated numbers from outer query.

Karteek

hi,

I have not done it myself but may be using a union query (see http://druid.io/docs/0.8.0/querying/datasource.html#union-data-source ), where one query is your inner query and other query is the nested group-by query may give you close to something youwhat want (or may be druid will just give you some exception if you tried this :slight_smile:

In general though, the use case sounds like that of a “self join” which is not very easily doable in druid really.

– Himanshu

I don’t think Druid currently has the ability to return both the inner and outer queries. What you can do is either issue both queries separately, or just issue the inner query and then do the totaling on the client side.

Thanks a lot Himanshu and Gian for taking time to answer my questions. i will go with the suggestion of issuing queries separately.

karteek

Hi Karteek,

Is it possible to post the oracle sql equivalent to the above json query please.

I am looking to implement the below query through json and want to makesure if Druid will be able to handle these type of queries

SELECT ENT_CUST_ID FROM TABLE1

WHERE ENT_ACCT_STAT_DESC = ‘OPEN’

AND SOR_ID NE 6

AND ENT_CUST_ID IN

(SELECT ENT_CUST_ID FROM TABLE1

WHERE SOR_ID = 6

GROUP BY ENT_CUST_ID)

GROUP BY 1

Thanks,

Ravali

you will have to break this down to 2 queries where the 1st query output will be passed as a list to filter the output of 2nd query. you might need client side help to pass this list as filter to 2nd query.

karteek