Slow queries because of Historical configuration

Hi all,

We’re seeing some pretty slow queries and were wondering which is a better choice:

  1. Improve the Historical Node in terms of memory and CPU.
  2. Add a second Historical to our Druid cluster.

Which of these seems to be the better option for improving query speed?



The answer to this question depends upon on the resource which is the bottleneck during those slow queries.
For eg: If Disk I/O is becoming the bottleneck in that case adding more cpu or memory won’t help but probably more disks will help.

To find out the bottleneck observe various system metrics during those slow queries and you can also use various metrics emitted by druid to dig deeper.

If none of the system resources is exhausted,it may indicate wrong druid configuration and adding more of resources won’t help.

We’ve been seeing a huge increase in CPU when firing several queries at the same time and/or rather “complex” queries. We’re barely using GroupBys as recommended in the docs.

I’ll add metrics in order to see where the bottlenecks lie. However, as a test, we added one more Historical and have seen less timeouts and high CPU peaks.