Workaround after ingestion failure


I was going through the following issue:

Can someone comment - if we run into such an issue, how to recover without having to deploy a new code ?

In my case, I am seeing the following error. There was a failure in zk cluster and middle manager service went down.

2019-12-30 00:09:52 WARN [KafkaSupervisor-flowlogs-Reporting-0] org.apache.druid.indexing.kafka.supervisor.KafkaSupervisor - Lag metric: Kafka partitions [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39] do not match task partitions

2019-12-30 00:09:57 INFO [KafkaSupervisor-flowlogs] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - [flowlogs] supervisor is running.

2019-12-30 00:09:57 INFO [KafkaSupervisor-flowlogs] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - Creating new task group [0] for partitions [0, 32, 2, 34, 4, 36, 6, 38, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30]

2019-12-30 00:09:57 ERROR [KafkaSupervisor-flowlogs] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - SeekableStreamSupervisor[flowlogs] failed to handle notice: {class=org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor, exceptionType=class, exceptionMessage=Expected instance of org.apache.druid.indexing.seekablestream.SeekableStreamEndSequenceNumbers, got org.apache.druid.indexing.seekablestream.SeekableStreamStartSequenceNumbers, noticeClass=RunNotice} Expected instance of org.apache.druid.indexing.seekablestream.SeekableStreamEndSequenceNumbers, got org.apache.druid.indexing.seekablestream.SeekableStreamStartSequenceNumbers



Hey Dhiman,

I’m not totally sure based on your logs (they don’t paint the full picture) but this might be related to At any rate, please try updating to the latest version of Druid, where this and other bugs related to start vs. end sequence numbers have been fixed.


Hi Gian,

The druid cluster is a production setup and any upgrade needs to go through a process.

There was a network change and as a result ZK cluster went down as well as other druid

services. Those services were brought up and ever since I am seeing the error messages

related to start and end sequence numbers. I have tried to restart different services but

ingestion task is failing. Is there no way to bring the ingestion tasks up ? Looking for

a temporary work around.



Hey Dhiman,

If you can’t upgrade, you might be able to fix this by manually editing the metadata in the metadata store to be of ‘end’ type rather than ‘start’ type. Doing a manual reset might help too, although this will cause your ingestion to lose its place in the Kafka stream and reset to earliest or latest (which may or may not be acceptable to you).