Kafka indexer read limit rate

Hi,

we had an issue with druid which was down for some time, and when druid cluster came back and had to read messages from kafka,

we started to have a lot of memory issues for the indexing tasks, probably since druid had to “catch up”.

Druid is configured for specific taskDuration, and usually read 1000 msg during that taskDuration, but when the kafka indexer

need to “catch up” it can get during taskDuration 100000 messages since there is no read rate limit.

so 2 questions:

1 - is there any config in druid to limit the read rate of messages per second for the new kafka indexer ?

2 - if no, can that be added ? (spark have something similar for the spark kafka consumer in order to control more the backpresure)

Thanks,

Anybody ?

The Kafka index task does not have the concept of rate limiting itself.

However, it can accept arbitrary Kafka Consumer properties which could possibly be used to throttle throughput somewhat. You are able to constrain the total number of records (eg “max.poll.records”) or bytes that the consumer will return in each request. Setting those quite low might limit your throughput to something manageable.