We are trying to better understand the internals of each druid component and how it affects the performance. I have a use case where we are trying to run a group by query with 50 users concurrency and repeat it every 5 seconds. The query is mostly hitting the middle-manager for real-time data. I see the broker is becoming bottleneck. Looking at the resource utilization, I don’t see any constraint on CPU or Memory on the broker node.
Appreciate your inputs here. What other metrics shall we look into?
Here are configs that we played with and gave enough to each component. We are trying to make the most of the resources. Is this just overkill or can we improve the performance further?
CPU: 4 Quadcore (32 cores HT)
2 Brokers (dedicated node)
2 Historicals (dedicated node)
2 Middle Manager (1st node is running Overlord and Co-ordinator and 2nd node has SupertSet and Router**)**
Ingestion Task Config:
Segment Granularity: Day
Average Segment Size: 40MB (How to make the segment bigger in real-time ingestion?)
Kafka Indexer Task completion time: 1HR
Max Rows per segment=5,000,000
16 Peon tasks with 2 replication factor. (Kafka Topic has 8 partitions)
200 (Tested with 66 as well)
300 (Tested with 20 as well)