Help with query performance, cluster configuration

Hi all,

I’m trying to get a general feeling for how much hardware there needs to be, to satisfy certain criteria on query performance and was hoping to get some pointers from you.

Two of our datasources has about 25 dimensions, and 10 metrics, 2 of them are hyperUnique. One (DS-H) is in hourly granularity with a retention of 3 days, while the second one(DS-D) is on daily granularity containing the data of (DS-H) aggregated per day.

DS-H: An hour of data is 8gb in segmentsize (sharded into 32 files, 250mb each), so a day of data is about 200gb, the whole datasource is about 600gb

DS-D: A day of data is 110gb in segmentsize (sharded into 200+ files, each 450mb each), currently at ca. 15 days of data, being 1,5TB of data,

All in all we have around 2TB of segment data to serve right now (replication on one extra historical). For this we setup 6 historicals with 16cores, 120gb memory, and 900gb hdd (no SSD), each.

Here is our current historical-config:

runtime.properties

druid.host=X.Y.Z.W
druid.port=8081

druid.service=druid/historical

druid.historical.cache.useCache=true

druid.historical.cache.populateCache=true

druid.cache.type=caffeine

druid.cache.sizeInBytes=5326600000

druid.processing.buffer.sizeBytes=1036870912

druid.processing.numThreads=15

druid.server.http.numThreads=10

druid.server.maxSize=880000000000

druid.segmentCache.locations=[{“path”: “/mnt/persistent/zk_druid”, “maxSize”: 880000000000}]

druid.monitoring.monitors=[“com.metamx.metrics.JvmThreadsMonitor”, “com.metamx.metrics.SysMonitor”, “io.druid.server.metrics.HistoricalMetricsMonitor”, “com.metamx.metrics.JvmMonitor”, “io.druid.client.cache.CacheMonitor”]

druid.request.logging.type=file

druid.request.logging.dir=/var/log/queries/

``

-server

-Xmx12294m

-Xms12294m

-XX:NewSize=6147m

-XX:MaxNewSize=6147m

-XX:MaxDirectMemorySize=20479m

-XX:+UseConcMarkSweepGC

-XX:+PrintGCDetails

-XX:+PrintGCTimeStamps

-Duser.timezone=UTC

-Dfile.encoding=UTF-8

-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

-Djava.io.tmpdir=/mnt/tmp

## jvm.properties

Per historical we give around 20gb as directmemory for the processing and 12gb of heap, which results in around 85gb of free memory for segments.

Querytimes vary from few seconds to half a minute, based on whether we query metrics of type longsum/double vs hyperUnique.

So far we only testing with a single user/broker accessing the historicals, so we assume no bottleneck there.

Caching disabled, even when we query the same interval/DS, but switch filters /metrics, queries are still a bit slow (even though I think all segments should fit into memory 6*85gb=510gb mem, 1 day is 200gb (*2 for replica)),

What can you tell, based of these numbers, what we could improve? (*) Any help appreciated.

Is it a config-issue, or just to few hardware to serve everything fast enough (our goal is sub 2seconds for simple queries filtering on 1 dimension, getting 3 metrics)

My next approach would be to add more machines, until most of the data would fit into memory/each historical has to scan less shards, and/or try machines with SSDs, anyone has some benchmarks using SSDs vs HDDs ?

Thanks in advance and kind regards,

André

PS: (*) when looking at the requests on one historical, I noticed, that (due to high amount of shards) a request over a day of data, the single historical still scans 130 shards and even with 15 processing threads this might take some time, to accomodate for the high query time. I assume this would improve when adding more machines, but our team still wants to figure out, how to improve, without just adding hardware to it.

Hi Andre,

At a very quick glance (just checked the basics like heap and processing threads) your configs look reasonable. I would suggest you look into where your bottleneck is: CPU or disk I/O. The best way to determine that is look at iostat or top while you are running Druid queries. If you’re CPU bound then you need more CPUs. If you’re I/O bound then you need either more memory or faster disks.

Or another possible solution is to side-step the hardware constraints and instead shrink the size of your data. If you have not spent much time playing with rollup then try that (basically: reduce the number of dimensions you store). It is common to get pretty extreme compression from rollup, like 5–30x.

Hi Glan,

thanks for the reply. Good to hear, that we’re not completely off with the config.

We invested some more time doing exactly what you suggested, by checking iotop/htop and saw at for fresh queries io was the main-problem, as stuff had to be loaded into memory, and mostly for the hyperunique we were capped by cpu a lot.

Just today, we adjusted our cluster to first have more machines, which reduced the stress and we’re discussing how to proceed from here.

What I generally take from this is, that more smaller machines is better than fewer bigger machines (assuming similar overall cpu/mem), since the data is better spread? Is there a break-off point in regards to balancing ? in that once the number of historicals exceeds X, it will only put shards on up to X historicals or will it always spread acroess the whole cluster (ignoring tiered_datasource, and assuming equal results for the balancing/cost-function)?

Will update, when possible.

André

Hi André,

What I generally take from this is, that more smaller machines is better than fewer bigger machines (assuming similar overall cpu/mem), since the data is better spread? Is there a break-off point in regards to balancing ? in that once the number of historicals exceeds X, it will only put shards on up to X historicals or will it always spread acroess the whole cluster (ignoring tiered_datasource, and assuming equal results for the balancing/cost-function)?

In general I usually prefer the opposite: fewer bigger machines. The main reason is that Druid does two-level merging for each query, with the first level happening at each historical and the second at the broker. This scales better if the historicals are larger (the broker is less likely to be a bottleneck). That being said, there are Druid clusters out there running smoothly with hundreds of nodes, so having a lot of nodes is not very problematic.