Kafka indexing service creating empty shards

I use Kafka Indexing service to ingest about a thousand rows an hour. I use
maxRowsInMemory=20000

druid.indexer.fork.property.druid.processing.buffer.sizeBytes=128000000
druid.indexer.fork.property.druid.processing.numThreads=3

druid.indexer.runner.javaOpts=-server -Xmx1g -XX:+UseG1GC -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

All other values are left unchanged. But yet, I see this

679 kB

shard 0 (1 of 0, numbered)

0 B

shard 1 (2 of 0, numbered)

This quite doesn’t make sense especially when

targetPartitionSize:5000000

numShards:-1
Shouldn't it make another shard only after finding first 5,000,000 post aggregated rows in the same segment interval?

numShards and targetPartitionSize are properties that only apply to batch-type indexing tasks

  • Jon

Hey Jon,

I’m sorry. My bad. I think I read it off of the Hadoop-based indexer page.
But maxRowsPerSegment=5000000
I just want to know, why is the Kafka indexer creating empty shards? Or why is it creating a shard at all?

Suhas

Hi Suhas,

Stream-based ingestion (like Kafka indexing) creates segments that initially show up as 0B while they are still mutable, but then have a concrete size once they become immutable. You should find that this segment gets a real size after a while.

It did show up now as a concrete number. But, it looks like that is an exact replica of the segment already present in that interval. Clicking on an interval displays two segments that call themselves partitionNum 0 and 1. My question is, I didn’t ask for a replica to be created.

A little metadata of that segment:

{"metadata":{"dataSource":"history","interval":"2018-06-05T23:00:00.000Z/2018-06-06T00:00:00.000Z","version":"2018-07-04T07:47:24.392Z","loadSpec":{"type":"local","path":"/req/druid-0.12.0/var/druid/segments/history/2018-06-05T23:00:00.000Z_2018-06-06T00:00:00.000Z/2018-07-04T07:47:24.392Z/1/index.zip"},"dimensions":"","metrics":"","shardSpec":{"type":"numbered","partitionNum":1,"partitions":0},"binaryVersion":9,"size":177383,"identifier":"history_2018-06-05T23:00:00.000Z_2018-06-06T00:00:00.000Z_2018-07-04T07:47:24.392Z_1"},"servers":["0.0.0.0:8083"]}

{“metadata”:{“dataSource”:“history”,“interval”:“2018-06-05T23:00:00.000Z/2018-06-06T00:00:00.000Z”,“version”:“2018-07-04T07:47:24.392Z”,“loadSpec”:{“type”:“local”,“path”:"/req/druid-0.12.0/var/druid/segments/history/2018-06-05T23:00:00.000Z_2018-06-06T00:00:00.000Z/2018-07-04T07:47:24.392Z/0/index.zip"},“dimensions”:"",“metrics”:"",“shardSpec”:{“type”:“numbered”,“partitionNum”:0,“partitions”:0},“binaryVersion”:9,“size”:177383,“identifier”:“user_history_2018-06-05T23:00:00.000Z_2018-06-06T00:00:00.000Z_2018-07-04T07:47:24.392Z”},“servers”:[“0.0.0.0:8083”]}


Thanks, Gian.