Records getting dropped using kafka indexing service

Hi,

I’m using kafka indexing service for real time ingestion. For the past few days, I saw that records being dropped when ingested using kafka indexing service. There are 2 scenarios I’m having issue:

  1. I can see a record(event) in real time nodes, but there are few use cases where the record goes missing in historical node.
  2. If I ingest huge data(10000 tps) with historical timestamps, I can see the records in real time nodes(Superset confirms me that I received x millions ). Feb 1st 5 million records when I check at hour 0, Feb 1st 4 million records when I check at hour 1. Is there a case where the records count get dropped?

Did anyone face this scenarios? Where do I start to debug? Below is kafka indexing service config. Let me know in case of more details.

{
  "type": "kafka",
  "dataSchema": {
    "dataSource": "DS_NAME",
    "parser": {
      "type": "string",
      "parseSpec": {
        "format": "json",
        "timestampSpec": {
          "column": "timestamp",
          "format": "auto"
        },
        "dimensionsSpec": {
          "dimensions": [],
          "dimensionExclusions": []
        }
      }
    },
    "metricsSpec": [
      {
        "name": "count",
        "type": "count"
      }
    ],
    "granularitySpec": {
      "type": "uniform",
      "segmentGranularity": "HOUR",
      "queryGranularity": "NONE"
    }
  },
  "tuningConfig": {
    "type": "kafka",
    "maxRowsPerSegment": 5000000,
    "resetOffsetAutomatically" : true
  },
  "ioConfig": {
    "topic": "KAFKA_TOPIC",
    "useEarliestOffset" : true,
    "consumerProperties": {
      "bootstrap.servers": "IP:9092",
      "group.id" : "GROUP"
    },
    "taskCount": 1,
    "replicas": 1,
    "taskDuration": "PT1H"
  }
}

``

It could be due to rollup (check http://druid.io/docs/latest/design/index.html for info on how rollup works in Druid).

At query time, are you using a “count” aggregator or a “longSum” to compute the event count? You should be using a type: longSum of fieldName: count. The type: count aggregator just counts the number of rows in Druid, which might be rolled up.