Kafka Indexing service are loosing records

Hi,
I’m doing some tests with kafka indexing services.

This is my case.

  1. I created local kafka instance

  2. I created kafka topic xxx

bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic xxx --partitions 1 --replication-factor 1

  1. I’m reading data with kafka-console-consument

$ bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic xxx --from-beginning > saved

  1. I configured kafka indexing service

“type”: “kafka”,

“dataSchema”: {

“dataSource”: “xxx”,

“parser”: {

“type”: “string”,

“parseSpec”: {

“format”: “tsv”,

“columns”: [“dim”, “ts”]

“timestampSpec”: {

“column”: “ts”,

“format”: “millis”

},

“dimensionsSpec”: {

“dimensions”: [“dim”],

“dimensionExclusions”: [

]

}

}

},

“metricsSpec”: [

{

“name”: “count”,

“type”: “count”

}

],

“granularitySpec”: {

“type”: “uniform”,

“segmentGranularity”: “HOUR”,

“queryGranularity”: “NONE”

}

},

“tuningConfig”: {

“type”: “kafka”,

“maxRowsPerSegment”: 750000,

“buildV9Directly”: true

},

“ioConfig”: {

“topic”: “xxx”,

“consumerProperties”: {

“bootstrap.servers”: “localhost:9092”

},

“taskCount”: 1,

“taskDuration”: “PT1M”

}

}

  1. After consumers configuration I’m pushing data with kafka-console-producer in loop

$ wc -l test

500 test

$ for i in seq 1 200; do echo $i; cat test | bin/kafka-console-producer.sh --broker-list localhost:9092 --sync --topic xxx --old-producer; done

  1. Results:

wc -l saved

100000

Druid:

!connect jdbc:calcite:schemaFactory=org.apache.calcite.adapter.druid.DruidSchemaFactory admin admin

0: jdbc:calcite:schemaFactory=org.apache.calc> select count(*) from “xxx”;

Odds are you are seeing the effects of rollup. Try SUM(count) rather than COUNT(*).

Thanks.

0: jdbc:calcite:schemaFactory=org.apache.calc> select sum(“count”) from “xxx”;