Kafka indexing service and replicas tasks

Hello Druid community!

I submitted a kafka indexing task using the following configuration for task count and replicas :

"taskCount": 3,
"replicas": 2

The kafka topic containing data have 3 partitions. From the coordinator console i can see that 6 running indexing tasks have been created and shared on my two realtime nodes.

My question: Is the published segments will contains only data read from 3 indexing tasks ?

i would like to be sure that i will not have any duplicate data in published segments.

Thanks for your help.

Florian

Hey Florian,

Yes, the replica tasks will generate segments with the same ID and only one of them will be used so you will not get duplicate data.

FYI for terminology, the indexing tasks run on ‘worker nodes’ as ‘peon’ processes, and not realtime nodes (which are a different type of node not used by the indexing service). The worker nodes can accommodate both realtime and batch indexing tasks.

Florian

The replica data overwrites each other, and and at the end of the day, you do not end up with 2x the segments.