Kafka Ingestion Peons Show org.apache.kafka.common.errors.DisconnectException: null

  • Druid Version: 0.22.1
  • Kafka Ingestion (idempotent producer)

We recently started seeing log messages for Kafka Ingestion Tasks from Peons showing things like:

2022-05-25T11:15:18,977 INFO [task-runner-0-priority-0] org.apache.kafka.clients.FetchSessionHandler - [Consumer clientId=consumer-kafka-supervisor-mjocmnma-1, groupId=kafka-supervisor-mjocmnma] Error sending fetch request (sessionId=328171797, epoch=1) to node 3:
org.apache.kafka.common.errors.DisconnectException: null
2022-05-23T21:25:01,257 INFO [task-runner-0-priority-0] org.apache.kafka.clients.FetchSessionHandler - [Consumer clientId=consumer-kafka-supervisor-dgnidfij-1, groupId=kafka-supervisor-dgnidfij] Error sending fetch request (sessionId=908182165, epoch=3) to node 3:
org.apache.kafka.common.errors.DisconnectException: null

This appears to happen intermittently and we have producers/consumers on other applications that don’t seem to encounter the same issue. According to public posts in other forums (stackoverflow, etc.) this is an error that is recoverable usually (hence the INFO level logging) and usually means that the fetch request to Kafka timed out.

One of the things users often do for these type of errors is to increase the consumer.request.timeout.ms to a higher value. Is this issue occurring potentially causing some of the intermittent failures we see? Is there a way to change the kafka consumer settings to have a higher request.timeout.ms? I could not find a way to change the peon kafka consumer config for that specific property.

Thanks,
Peter

Hi Peter,

I think consumerProperties within your Supervisor Spec might do the trick. Something like:

"consumerProperties": {
  "bootstrap.servers": "localhost:XXXX",
  "request.timeout.ms":XXXX
}

Best,

Mark

Thanks Mark,

That makes sense, let me give it a try

Peter

That worked, thanks!