Kafka Indexing task failover

Hi Druid,

Have a question about Kafka Indexing task failover. Assume the taskCount=2, replica=1 and these two tasks run in two different machines (A and B). If machine B crashed, which makes task B goes away, would a new task B be restarted immediately in the machine A before the end of taskDuration?



Hi Xuanyi,
As per https://druid.apache.org/docs/latest/development/extensions-core/kafka-ingestion – capacity planning section in that link

one more parameter also plays a role here - druid.worker.capacity

If enough capacity is available on machine A, it should be started theoretically.

Are you seeing any issues with respect to that?



Hi Siva,

I am trying to use aws_auto_scaling group for middleManager nodes. When it scales down and force the running task to be killed, I am concerned the data missing issue. But as your said, the killed task could be restarted if there’s sufficient
work capacity, it’s save to adopt aws_auto_scaling.

The reason I don’t want to use Druid built-in auto_scaling is that built-in auto_scaling is task based. Since the taskCount is fixed in the spec file, I figure it couldn’t handle spike in throughput. Please correct me if I am wrong.

Just for a re-confirmation, the failed task could be restarted SOON theoretically even if the replica=1 before the taskDuration timeout.



Hi Xuanyi,
You can continue to use AWS auto-scaling group. No issues with that.

Yes, the killed task could be restarted if there’s sufficient work capacity.

In kafa ingestions, druid_dataSource table stores info about till what point or offset the stream was read.

It continues to read from that next offset based on entry in druid_dataSource table.

Ideally, the data should not have gaps.

Hope this helps.

Thank you.