Druid Cluster Sizing

We have over 300TB of Data (in AWS S3), that we currently ingest into Druid, which results to about 500K segments (in AWS S3). We use AWS Instances with SSD for Historical and MiddleManagers.

Our current configuration is as below, however we are seeing degraded performance on the cluster, whenever there are Ingestions or multiple Large queries running. Should we provide more resources for the amount of data we have ?

Our current cluster configuration is as follows:

2 Master Nodes - m5.2xLarge (8 vCPUs, 32 GB Memory)
16 Query Nodes (Router and Broker Services) – r5.4xLarge (16 vCPUs, 128 GB Memory)
180 Historical Nodes - i3.4xLarge (16 vCPUs, 122 GB Memory)
10 MiddleManager Nodes - 13.4xLarge (16 vCPUs, 122 GB Memory)

Broker Service RunTime Properties:

druid.server.http.numThreads=20
druid.broker.http.numConnections=16
druid.broker.http.maxQueuedBytes=500000000
druid.server.http.defaultQueryTimeout=600000
druid.broker.http.numMaxThreads=80
druid.processing.buffer.sizeBytes=536870912
druid.processing.numMergeBuffers=10
druid.processing.numThreads=30
druid.broker.balancer.type=connectionCount
druid.server.http.maxSubqueryRows=2147483647

Router Service RunTime Properties:**

druid.router.http.numConnections=50
druid.router.http.readTimeout=PT5M
druid.router.http.numMaxThreads=100
druid.server.http.numThreads=100

Historical Service RunTime Properties:**

druid.server.http.numThreads=300

druid.processing.buffer.sizeBytes=536870912
druid.processing.numMergeBuffers=8
druid.processing.numThreads=30

druid.server.maxSize=3078632557773

druid.query.groupBy.maxOnDiskStorage=8000000000

druid.historical.cache.useCache=true
druid.historical.cache.populateCache=true
druid.cache.type=caffeine
druid.cache.sizeInBytes=21474836480

Middle Manager Nodes RunTime Properties:

druid.worker.capacity=15

druid.indexer.runner.javaOptsArray=["-server","-Xms1g","-Xmx10g","-XX:MaxDirectMemorySize=8g","-Duser.timezone=UTC","-Dfile.encoding=UTF-8","-Djava.io.tmpdir=/tmp/","-XX:+ExitOnOutOfMemoryError","-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager"]
druid.indexer.task.restoreTasksOnRestart=true

druid.server.http.numThreads=300

druid.processing.buffer.sizeBytes=104857600
druid.processing.numMergeBuffers=4
druid.processing.numThreads=15

druid.indexer.task.restoreTasksOnRestart=true

This won’t be a complete answer to your question as questions like that (!) will often need to be answered through a consultancy engagement!

You may want to check these guiding notes from the docs:

Are you collecting metrics from Druid? They’re pretty much essential.

(For what it’s worth, it sounds like you are core-bound…)

Thanks Peter! We have gone thru the Basic Cluster Tunning guide, and thats how we came up with our current configuration. However, seems like we have some short coming thats resulting in sluggish performance.

I want to approach the Instance Size Upgrade, objectively and thats were I need some input. The Basic Cluster tunning guide, does not talk about how to size your cluster and configuration, based on number of segments and data size . It talks about sizing your Connections and memory.

Any guidance here, would be helpful.

Gotcha :slight_smile:

I added a reply to this thread a few days back – maybe something here will help at some level?

I would add too again around the importance of metrics collection: for example, knowing the number of segments scanner per query lets you know how many cores are engaged (one segment is scanned by one core) – and then segment scan time tells you the “fastest” you could get. So if you find that the segment scan time is like 50 seconds you know you have an issue with your segment sizes or their contents. If you have like a thousand segments being scanned in a query, you know to look at things like secondary partitioning and filters (especially time) - any approach to tackle query performance by reducing the number of segments that will be picked up and scanned at query time – before you then go and simply add more cores to cope with the number of segments that need to be scanned.

My theory on core-boundednessnessness was because of the impact on ingestion, which again consumes cores – so perhaps there is contention for that resource somewhere.

1 Like

Metrics on consumed jetty threads is also useful as it tells you if the intercommunication is being starved… that kind of thing.

Sorry to be so non-specific…

@krishnat I would tend to agree with Peter here. This is a complex issue that needs an in-depth analysis with special attention paid to metrics such as segments wait/scan times as well as a host of others, and query workload considerations.

A quick glance at your configs tells me that you have over-subscribed your processing threads in the historical, which can be done on a case-by-case basis, but with proper testing. I’d reduce that to 15 (16 cores-1), reduce the timeout to about 120 seconds to free up the cluster from long running heavy queries and I’d suggest to take a look at the connection pooling guidelines once again.

1 Like