How can druid.query.groupBy.numParallelCombineThreads be used?


I read the following in the docs:

druid.query.groupBy.numParallelCombineThreads - Hint for the number of parallel combining threads. This should be larger than 1 to turn on the parallel combining feature. The actual number of threads used for parallel combining is min(druid.query.groupBy.numParallelCombineThreads, druid.processing.numThreads).

Has anyone used this particular parameter in their cluster configuration? Does this help speed up groupBy queries?



The biggest gain you can get from a groupby performance is making sure that you have enough numMergeBuffers to accommodate for concurrent groupby queries. If you are only grouping on one dimension, it is preferred to use TopN instead as it is more performant. This also depends on what query you are running as there are some functions that are inherently slower compared to others. The more you can prune ahead of time before results get shipped to Broker, the better.

Rommel Garcia