[druid-user] Getting UNABLE_TO_CONNECT_TO_STREAM when resuming Kafka Supervisor

I have a Kafka datasource. I had to update the ingestion spec and I submitted the updated spec via a POST call. This triggered the current running index_kafka tasks for this source to go the segment hand-off phase but new index_kafka tasks were not getting created. The supervisor then moved to UNABLE_TO_CONNECT_TO_STREAM state. So I suspended the data source and tried resuming it, it stays in CREATING_TASKS for a while but eventually moves to UNABLE_TO_CONNECT_TO_STREAM. In the overlord log, I do see an WARN message saying

WARN [KafkaSupervisor-poc_kafka_source_hourly] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - Exception in supervisor run loop for dataSource [poc_kafka_source_hourly]

It doesn’t print the complete stack trace for the exception. Resetting the supervisor made it work eventually but we cannot do that in production.

Any errors in any other log, like middle manager or historical?

Interesting - as that state seems to imply that Druid doesn’t think it’s connected to that stream before:

Does that make sense in your scenario?

And +1 to Rachel on the other log files…