Druid 0.10.0: Kafka Indexing Service ignores similar/identical data from Kafka

Hi,

we found out a strange behaviour with our Kafka Indexing Service. Data seems to be ignored/thrown away!

Short description of our data pipeline:
Servlet receives data, process it and stores it in Kafka topic -> Kafka -> Druid Kafka Indexing Service ingests data from topic

We build up a test with Postman and send 100 Request (every 50ms) of identical data (except timestamp)
-) Data looks something like that: { “time”: 15202573xxxxx, “ak” : “postman”, “evt”: “view”, “cpr”: null} (time == milliseconds in long)
-) we checked log of servlet: all 100 request entered and were processed
-) we checked Kafka topic with kafka-console-consumer: all 100 records landed in our topic
-) we checked data with Druid: Bang! Druid just told us a count of 4 (FOUR)

My questions are:
Is this a bug or a feature?
Does Druid think this is duplicated data and ignores it?
Is there something wrong with my configuration?
How can we prevent such a behaviour by configuration?

We also found out, that if we add additional data (generated UUID) which is different every data record, then everything works fine and Druid counts right.
E.g.: { “time”: 15202573xxxxx, “ak” : “postman”, “evt”: “view”, “cpr”: “7E81C208-A6B1-4940-86B3-08E13C526AA0”} (where cpr is different in each of the 100 records)

Thanks, Alex

Hi Alex,

You are probably seeing the effects of rollup: http://druid.io/docs/latest/ingestion/schema-design.html#counting-the-number-of-ingested-events

By default, Druid will combine records with identical dimensions into the same row. The doc page above suggests using a count metric at ingest time to track the number of input events, and then summing over it at query time. This is the usual way people count the number of ingested events in Druid. You can also disable rollup by setting rollup: false in your granularitySpec, although then you won’t get the storage footprint benefits.

Hi Gian,

after a little thinking and testing we got it. We were just querying the wrong data!
On the other hand, we found out that RollUp is working =)

Thanks a lot!

Alex