I ran some tests and can’t explain the metrics result. Any help is appreciate !
Tests on datasource “visits”, 1 segment (1 shard) per day (~400-500MB) 96 dimensions 6 metrics
QueryType: Timeseries on 6 metrics (2 HyperUnique, 2 LongSum, 1 DoubleSum)
1 broker node: c4.4xlarge
1 historical node: r3.8xlarge (32 cores with actually 31 workers in conf)
query on 7 days (7 segments):
Historical query/time: 2292ms
max(Historical query/segmentAndCache/time): 2283ms
avg(Historical query/segmentAndCache/time): 2081ms
query on 30 days (30 segments):
Historical query/time: 5649ms
max(Historical query/segmentAndCache/time): 5640ms
avg(Historical query/segmentAndCache/time): 4635ms
(other metrics are not relevant)
The fact is query/segmentAndCache/time give us “Milliseconds taken to query individual segment or hit the cache (if it is enabled on the historical node).”
The max time of this metric is expected to be the same between both queries, but we can clearly see that in the second query it take more than twice the time.
My historical has 31 worker, so he’s able to scan and compute 31 segments in parallel. I expected the time added in the second query came from merging, but it seems like … NO !