Druid kafka ingestion throughput

Hi All,

We have druid cluster setup having following configuration:

2 historical nodes having 96 CPU & 512 GB RAM in total
1 Master 16 CPU 64GB RAM standalone zookeeper running here
1 Query 8 CPU 32GB RAM

Kafka 3 * (16 CPU 64GB RAM) standalone zookeeper

We have 100 bytes event getting ingested in kafka topic with 30 partitions with rate of 1M/s.
We were able to achieve 1M/s with druid using 48CPU & 400GB RAM(When we tried to scale down infrastructure).
With above infrastructure, we were expecting to get bit increased throughput like 1.3/1.4M/s from 40/50 kafka topic partitions but it seems that the throughput does not cross 1M average

Any specific reason why its happeing?
Any help would be appreciated.

Kafka & druid is working with default settings/according to basic cluster tunning guide

Regards,
Poonam

What is your task count set to?

https://druid.staged.apache.org/docs/latest/development/extensions-core/kafka-ingestion.html#capacity-planning

It is set to equal to no of partitions. Means when 30 partitions it’s 30 & when using 40 partitions it’s 40

Does it improve when you have more partitions? How does the OS look during the run? Any bottlenecks?

No that is the issue we are facing. Even if we are increasing the partitions and the load but throughput seems to be same.

We tried following configs for data servers till now:

48 CPU 400GB RAM 30 partitions load from Kafka 1M/s druid ingestion throughput 1M/s

96 CPU 512GB RAM 40 partitions load from Kafka 1.4M/s druid ingestion throughput 1.06M/s

Load from Kafka 2M/s druid ingestion throuput was 1.14M/s

Above mentioned experiments, we have observed CPU and Memory above 90-95% average

So we tried increasing the memory first to see behaviour

96 CPU 560GB RAM 40 partitions load from Kafka 1.4M/s druid ingestion throughput 1.06M/s

Here CPU was hitting around 90% but memory was around 75-80% average

Then we tried increasing the CPU from 96 to 120

Here ingestion throughput got degraded to 0.9 M/s
CPU was on peak but memory was not hitting peak here.

So we are actually trying to figure out what is the bottleneck here CPU RAM or VM configurations

Regards,
Poonam

Hey Poonam :slight_smile:

In the console, you can see the single supervisor and all of the tasks - are all the tasks healthy?
And it looks like you’re saying that even if you have a taskCount of just 2 or 3 in the supervisor spec, you would be getting the same throughput - is that right?
https://druid.apache.org/docs/latest/development/extensions-core/kafka-ingestion.html

Is there any throttle quota on Kafka consumers in the Kafka cluster?