Druid nested groupBy/ Filter on metrics

hi,

We are trying to use nested groupBy for filtering on metric values. Inner groupBy is used to filter metric values using having. Outer groupBy does actual work of grouping the appropriate dimensions. groupBy v2 strategy has been used.

We are hitting “Maximum number of rows [500000] reached” issue. We can increase druid.query.groupBy.maxResults to higher than 500K but it doesn’t sound to be scalable solution.

Is there any other efficient option to filter by metric values ? In our case nested groupBy has severe performance impact. For a day worth of data, normal groupBy took 150 ms for response which is quite impressive. For the same date range, nested groupBy took 7 seconds and failed with above issue. Any official benchmarks on nested groupBy performance ?

Thanks !

Siva

Hey Siva,

“Maximum number of rows [500000] reached” means you’re not using groupBy v2, as that error message can only be generated by v1. Could you please double check your settings?

Thanks Gian for very quick reply.

Below are broker runtime properties added for v2 strategy,

druid.processing.numMergeBuffers=10

druid.query.groupBy.defaultStrategy=v2

druid.query.groupBy.maxMergingDictionarySize=100000000

druid.query.groupBy.maxOnDiskStorage=2000000000

druid.processing.numMergeBuffers=10 property is added in middle manager and historical runtime also. I have specifically added in runtime properties of historical and middle manager instead of common runtime. As per the formula changed MaxDirectMemorySize of Broker, Historical node and Middle managers. Restarted middle manager/historical and broker roles. I think there is no changes required for overlord and coordinator.

Added query context like below. I have tried this context in nested query as well as upper query.

“context” : {

“groupByStrategy”: “v2”,

“useCache”: false,

“populateCache”: false

},

Still I might be missing something here.

Those look like the right configs – are you sure everything is running 0.9.2+?

Oho I get it. We are using imply-1.3.0 which is Druid 0.9.1.1. Do you have any documentation for upgrading to Imply 2.x ?

Hi Siva,

You can find upgrade documentation here: https://imply.io/docs/latest/release