One of the biggest factors attracting me to druid was the fact that giving enough hardware, one can drive down the latency to whatever is needed.
However, I am having trouble getting that.
My data source is 40gb, replicated across 2 historicals (40gb per historical * 2 = 80gb total).
This 40gb is 4 months of data, however we only ever query 3 months at a time.
My aim is to get the 3 month query (across approx 30gb of data) to about 200-200ms latency.
Here is what I tried:
2 historicals on google cloud.
n1-highmem-16 instances for historicals.
Each instance has 16 cores, 104GB ram.
Thus each query got 32 cores to execute.
Here we saw response times of upto 800ms-1s.
Each instance has 32 cores, 120GB ram.
Thus each query got 64 cores to execute.
However, I now see latency of 1s.
So not only did the latency not****improve by doubling the cores, it actually became higher!
Here is a graph showing performance:
the historical-32 nodes are the second try ones with 32 cores.
historical-1 and historical-2 are first try with 16 cores.
In all cases, the config was:
Is there anything I could be doing wrong here?
Why doesnt increasing core count lower latency?
What can I do to get to my desired latency level?