Druid 'Broker' in hung state after firing concurrent queries

Hi Team,

We have a druid cluster with 10 Historical (i3.8xlarge) and 3 Broker (r5.12xlarge) nodes. When we are firing around 50-60 concurrent queries, the broker goes into hung state. We have to restart broker to get things running again.

Below are the historical and broker configuration details:

  1. Broker config

HTTP server threads

druid.broker.http.numConnections=100
druid.server.http.numThreads=50
druid.broker.http.readTimeout=PT5M

Processing threads and buffers

druid.processing.buffer.sizeBytes=2147483647
druid.processing.numThreads=60

Query cache

druid.broker.cache.useCache=true
druid.broker.cache.populateCache=true
druid.cache.type=local
druid.cache.sizeInBytes=2000000000

Query result cache

druid.broker.cache.useResultLevelCache=true
druid.broker.cache.populateResultLevelCache=true
druid.broker.cache.resultLevelCacheLimit=5242880
druid.broker.cache.unCacheable=

druid.sql.enable=true
druid.sql.http.enable=true

druid.processing.numMergeBuffers=20
druid.query.groupBy.defaultStrategy=v2
druid.query.groupBy.maxMergingDictionarySize=1000000000
druid.query.groupBy.maxOnDiskStorage=2000000000

``


2) ** Historical configuration**

HTTP server threads

druid.server.http.numThreads=50

Processing threads and buffers

druid.processing.buffer.sizeBytes=1073741824
druid.processing.numThreads=31

Segment storage

druid.historical.cache.useCache=true
druid.historical.cache.populateCache=true
druid.segmentCache.locations=[{“path”:"{path_to_cache}",“maxSize”:6000000000000}]
druid.server.maxSize=6000000000000

druid.query.groupBy.maxMergingDictionarySize=1000000000
druid.query.groupBy.maxOnDiskStorage=2000000000

Could someone please help me out with the below questions:

1) Right now, the cache is enabled on both broker/historical nodes. Will it make any difference if the cache is just maintained on the historical?

2) By switching to ‘caffeine’ cache, will there be any improvement?

3) How to use only the ‘query result’ cache on the broker node?

4) Are there any configuration tweaks that needs to be done to historical/broker nodes above, that will help to improve query performance?

Thank you in advance.

Cheers,

Vinay

Unfortunately Vinay I don’t have a solution to offer you and I am very interested in hearing about the solution if someone assists you.

I do want to know if you can locate the area of your broker log that corresponds to the timing of the hung state. Would you be able to post it?

Hi Vinay,

I would highly encourage you to take a thread dump of the broker process when it is hung like that. There might be some lurking bug that needs to be fixed.

Also, you didn’t mention what version of Druid you are running. Also would be interesting to know the number of shards your historical servers are hosting.

To enable higher concurrency and better user experience, there are a few things you could tune:

  1. Disable caching on broker. In general it is recommended to enable caching on only one - either historicals or brokers but preferably historicals.

  2. Check ulimit -n. See if you are seeing sockets not available or too many open files exception in broker or historical logs. Typically, to support greater concurrency you would likely need to bump it up to a higher number.

  3. Increase number of http threads druid.server.http.numThreads. At the same time, you might also want to look at enabling druid.server.http.enableRequestLimit on the broker. Under heavy load, the second config will throw an error to the user that the server is too busy which may be preferable to users seeing their queries stuck.

  4. By default the jetty server queue is unbounded. Think a bit more about if it makes sense to place a bound on this queue. This is controlled by the config http://druid.io/docs/latest/configuration/index.html#historical-general-configuration druid.server.http.queueSize

Hope this helps.

Hi Vinay,

As Chris requested please post the broker logs. Can you also post the jvm.config of both historical and broker nodes.

**>> 2) By switching to 'caffeine' cache, will there be any improvement?**
Using caffeine is recommended as local cache is deprecated. Look at the [docs](http://druid.io/docs/latest/configuration/index.html#cache-configuration) for more info. This may not have direct correlation to the issue at hand but should be changed.
In addition to trying the suggestions by Samarth,
1.  It's very helpful to look at the broker logs.
2. Capture CPU, I/O, memory stats etc using utilities like iostat or vmstat during the duration when you are running the queries. This might help find bottlenecks if any with CPU, I/O, memory etc.

Thanks,
Sashi

Below is the JVM config for historical and broker nodes. We are using 0.13.0 version of Druid.

Broker JVM config:

-server
-Xmx30g
-Xms20g
-XX:NewSize=6g
-XX:MaxNewSize=6g
-XX:MaxDirectMemorySize=178g
-XX:+UseConcMarkSweepGC
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-Duser.timezone=UTC
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=/cdp/druid/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
-Dlogfilename=broker

**Historical JVM config:**

-server
-Xmx32g
-Xms32g
-XX:NewSize=6g
-XX:MaxNewSize=6g
-XX:MaxDirectMemorySize=164g
-XX:+UseConcMarkSweepGC
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-Duser.timezone=UTC
-Dfile.encoding=UTF-8
-Djava.io.tmpdir={location}
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
-Dlogfilename=historical

Vinay,

Depending on the type of queries ran and how much data to be queried, the way how the memory is consumed by Broker or Historical could be a mixture of off-heap and heap. On the Historical, there should be enough room to map the segment files to memory. If that is too low, Historical will do a lot of swapping of segments to accommodate for larger queries. The issue of broker hanging could be waiting on Historicals (lots of GC pauses, oversubscribing the threads, etc.) or broker itself is having resources issues.

I noticed that broker is a bigger machine compared to historical. It should be the other way around. I have re-calculated the memory configs given your machine types and here’s what came out.

Broker Config:

JVM config
Xms
30
GB
Xmx
30
GB
MaxDirectMemorySize
56
GB
Total Offheap
318
GB
Runtime Properties
druid.processing.numThreads
47
druid.server.http.numThreads
83

Historical Config:

JVM config
Minimum
Xms
30
GB
Xmx
30
GB
MaxDirectMemorySize
37
GB
Recommended Memory For Segments
186
GB
Runtime Properties
druid.processing.numThreads
31
druid.server.http.numThreads
66

This setup balances the utilization of threads, off-heap and heap. I don’t have any view into your query pattern, size of data you are querying but I’d recommend initiating an incremental load test from 10, 20, 30, 40, 50 , 60 simultaneous queries and understand how that impacts your resources on brokers and historicals. You should have an idea what to tweak and if you need to expand your machines. I’d also recommend making the brokers smaller than historicals as the latter requires more power to do a lot of work. Brokers makes connections to historicals, merger results, and do groupbys.

Rommel Garcia

Thank you Rommel. I will try this configuration. If I reverse the instance types used for broker and historical nodes, how do I calculate the optimal configuration for the same?

Regards,

Vinay Patil

Hi Vinay,

This document will help you http://druid.io/docs/latest/configuration/index.html#historical.

If you swap the instance types, Historical would be r5.12xlarge (https://aws.amazon.com/ec2/instance-types/r5/).

Below is some guideline on how you could calculate the configurations. Please look at the above documentation for more details.

CPU - 48 cores

RAM - 384 GB

** JVM
Config**
Minimum
Formulae
Xms
30
GB
Xmx
30
GB
MaxDirectMemorySize
60
GB
druid.processing.buffer.sizeBytes * (druid.processing.numMergeBuffers +
druid.processing.numThreads + 1)

Runtime Properties
druid.processing.numMergeBuffers
12
default: max(2, druid.processing.numThreads / 4)
druid.processing.buffer.sizeBytes
1073741824
bytes
From your config
druid.processing.numThreads
47
Number of cores - 1
druid.server.http.numThreads
83
max(10, (Number of cores * 17) / 16 + 2) + 30

If you choose to use another instance type you can workout the configurations in a similar fashion.

Thanks,

Sashi

Thank you Sashidhar. I will try this out.