Druid Broker Tuning

Hi Team,

We are trying to better understand the internals of each druid component and how it affects the performance. I have a use case where we are trying to run a group by query with 50 users concurrency and repeat it every 5 seconds. The query is mostly hitting the middle-manager for real-time data. I see the broker is becoming bottleneck. Looking at the resource utilization, I don’t see any constraint on CPU or Memory on the broker node.

Appreciate your inputs here. What other metrics shall we look into?

Here are configs that we played with and gave enough to each component. We are trying to make the most of the resources. Is this just overkill or can we improve the performance further?

Hardware Config:

Memory: 512GB

CPU: 4 Quadcore (32 cores HT)

Disk: HDD

Druid Cluster:

2 Brokers (dedicated node)

2 Historicals (dedicated node)

2 Middle Manager (1st node is running Overlord and Co-ordinator and 2nd node has SupertSet and Router**)**

Ingestion Task Config:

Segment Granularity: Day

Average Segment Size: 40MB (How to make the segment bigger in real-time ingestion?)

Kafka Indexer Task completion time: 1HR

Max Rows per segment=5,000,000

16 Peon tasks with 2 replication factor. (Kafka Topic has 8 partitions)

Node

Property

Value

Broker

druid.broker.jvm.direct.memory

100GB

druid.broker.jvm.heap.memory

60GB

druid.processing.numThreads

31

druid.server.http.numThreads

200 (Tested with 66 as well)

druid.processing.buffer.sizeBytes

2000000000

druid.broker.http.numConnections

300 (Tested with 20 as well)

Historical

druid.processing.numThreads

31

druid.server.http.numThreads

66

druid.historical.jvm.direct.memory

100GB

druid.processing.buffer.sizeBytes

2000000000

druid.historical.jvm.heap.memory

60GB

Middle Manager

druid.processing.numThreads

20

druid.middlemanager.jvm.heap.memory

60GB

druid.middlemanager.jvm.direct.memory

100GB

druid.server.http.numThreads

50

druid.processing.buffer.sizeBytes

512MB

druid.indexer.runner.javaOpts

-Xmx15g

Hi Team,

Just wanted to followup here if anyone can share their expertise on this topic. We are just trying to make most out of our boxes.