We are working on ingest Kafka streaming data into Druid. As I know, the data from Kafka brokers could be duplicated since the publishers want to assure no message loss. Does Druid’s Kafka indexing service has already handle duplication? If not, how shall we set the spec file?
Hi Xuanyi, Yes, per Druid doc, These (druid) indexing tasks read events using Kafka’s own partition and offset mechanism and are therefore able to provide guarantees of exactly-once ingestion. https://druid.apache.org/docs/latest/development/extensions-core/kafka-ingestion.html
Hope this answers your question.