Re: [druid-user] Druid 0.22.0 - Query performance

I don’t have details, but how does it perform with “forceLimitPushDown”: true and “applyLimitPushDownToSegment”:false?
Docs say applyLimitPushDownToSegment can negatively affect performance, although it says if there are large numbers of segments, and you only have 24. Maybe it makes druid scan more segments, while the limit could have been reached in fewer?

Hello -

One of our developers (at Imply) let me know that we’ve seen this issue, and think it is likely due to the implementation of forceLimitPushDown.
Iiuc, druid is doing string comparisons for the groupBy, and when merging results from segments, the dictionaries are not available, and it has
to store and sort the strings. (And you’re grouping by 57 dimensions, presumably some or many of them are strings.) Hopefully this will be
improved on in the future. This is what he said to me, for more detail:

This is because LimitedBufferHashGrouper needs to compare those values to keep only the top N entries in memory, but the dictionary is not available when merging (not combining) per-segment results.

So, it’s a result of current code, and it sounds like you’re better off not using it for this query. I hope that helps!

It sounds like it might currently be useful if you’re grouping by numeric columns. Hopefully in the future it’ll work for strings, too.