Kafka Indexing Service - Multiple kafka topics per task

Any reason why the ability to handle multiple kafka topics (using a pattern) was removed from new kafka indexing service? That would have been hugely useful for my usecase. I have tens (eventually will hit 100) of kafka topics that feed data to my druid cluster. As it stands now, each topic will need to be handled by a different task, which means a worker per topic (not including the replicas and partitions). Each worker (If I understand it right) is a JVM and the kafka tasks attach to the worker for their life time which is pretty much never ending. And, that means a ton of resources just to run the kafka indexing tasks.

Any thoughts around how I can workaround this issue?

Thanks, Arul

Hey Arul,

There’s no technical reason multiple topics couldn’t be implemented in the indexing service, but it was omitted in the initial implementation because of the added complexity in supporting exactly-once ingestion across multiple topics. One possible solution is to add a stream processor (maybe look at Kafka Streams?) before Druid that will merge the different feeds into a single Kafka topic.

Hi David,

Actually, we have same resource limitation problem after updating from batch indexing to Kafka indexing service. Based on your reply, we can use Kafka Streams to merge multiple kafka topic into single one but how can druid create(indexes data into) multiple datasources from this single merged topic?

Right now, topics to datasources are mapped 1-to-1, and any joining or splitting of streams needs to be done before Druid ingestion. You should be able to use a stream processing technology to transform n original topics into m transformed topics which will map 1-to-1 with m Druid datasources.

Hi David,

Are there any plans in the roadmap for a single task to handle multiple topics?

Thanks David.

Will take a look at kafka streams.

Hey Jason,

Reading from multiple Kafka topics into a single datasource isn’t currently on the roadmap. If this is an important feature for you, could you raise an issue for it so that we can track it?

Thanks, David, But actually, what I was referring to is the ability for a single task to read from N Kafka topics into N data sources.

I have created an issue for this: https://github.com/druid-io/druid/issues/3752

I too have a feature request for the original question: Support for topic pattern in kafka Indexing service extension. I have created this issue for it: https://github.com/druid-io/druid/issues/3945