Improve performance of groupBy Queries

Hi,

How can i improve the performance of groupBy queries if i have very small segments (each few KBs in size) and 15-20 dimensions with 7-8 dimensions are high cardinality dimensions. The total size of the datasource is ~12MB. groupBy queries for data over 6 months is taking >10 seconds sometimes.

I have tried allocating more resources to broker and historical nodes and tuning the parametering related to theading and jvm options as well providing enough compute power to deal with a datasource of 12MB.

  • Does middle manager nodes also need bigger resources? I believe if one is looking for data already pushed to historical nodes, middle manager will not involve in query path

  • Does druid.query.groupBy.numParallelCombineThreads help in speeding up groupBy queries?

Thanks,

Prathamesh

Hi Pratamesh,
Even though I don’t have concrete answer for the issue you are facing , by any chance, do you have similar kind of data set with 1 segment containing 12MB or 10 segments containing roughly 1MB?

I am kind of inclining towards having 1000s of small segments in KBs might be the culprit here.

Thank you.

–siva