Does Druid Kafka Indexing service handle duplicate data?

Hi all,

We are working on ingest Kafka streaming data into Druid. As I know, the data from Kafka brokers could be duplicated since the publishers want to assure no message loss. Does Druid’s Kafka indexing service has already handle duplication? If not, how shall we set the spec file?



Hi Xuanyi, Yes, per Druid doc, These (druid) indexing tasks read events using Kafka’s own partition and offset mechanism and are therefore able to provide guarantees of exactly-once ingestion.

Hope this answers your question.