druid sum kafka message many times

hi all,

I used “select” querytype to query the data source events detail.At some certain timestamp,I found the event count is 4 times bigger than kafka stored real messages! Say,kafka just has 1 message,but druid events count is 4.

Are the kafka messages may be resend to druid and druid repeated calculation many times?



What is the select query you are sending and what is your indexing json spec?

– Himanshu

I’d also like to add that Kafka consumers does not guarantee exactly once delivery of messages. Kafka does at least once delivery of messages. So if you have a poor network where acks are lost, Kafka will resend messages and generate duplicates.