Druid Group by query taking too long

Hey Guys,

I ran some tests and can’t explain the metrics result. Any help is appreciate !

Tests on datasource “visits”, 1 segment (1 shard) per hour (avg size of segments ~3MB) 40 dimensions 2 metrics (count, longSum)

QueryType: Group by on 1 metric (count)

1 broker node: r3.2xlarge

6 historical node: r3.2xlarge (8 cores with actually 7 workers in conf) 160GB SSD for segment cache

query on 60 days (1440 segments):

user: 3359ms

Historical query/time: 2796ms

total for all segments on 1 Historical node( query/segmentAndCache/time): 19277ms

query on 120 days (2880 segments):

Total Segments = 120*24 = 2880

query/segment/time = 37162 ( avg for 1 Historical node (sum of all segments))

Query time = 5428ms

This is keeps on increasing for 1 year of query Query time = 15000ms

Historical runtime properties

Any help guys.


it is hard to answer to such question without having full knowledge about the data, query, node configs, number of row returned by query, etc…

But here is few hints that can help tuning your cluster.

First you are doing a groupby on one dimension have you though on using TopN query ?

Second the segment size seems very small can you make sure that the number of rows per segments is about 5M ?

Third are you doing any rollup when ingestion the data ? what it is the value of queryGranularity when ingestion the data ?

Finally the max JVM heap size can be tweaked do you see lot of GC ?