Hi Luis ,
For the attached peon task the middle-manager and Overlord log do not have any details as they seems incomplete . Peon task has completed at 2020-01-14T13:47:24,929 however the middlemanager and overlord log has logging till 2020-01-14 12:47 only.
However, I looked into the one of the old kafka indexing task for supervisor :[KafkaSupervisor-rt-idbox]. I see below error in the overlord log:
Task-Id: index_kafka_rt-idbox_4ed329169eec831_hcbhclbm
2020-01-14 12:46:39.004,“2020-01-14T12:46:39,004 INFO [KafkaSupervisor-rt-idbox] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - {id=‘rt-idbox’, generationTime=2020-01-14T12:46:39.004Z, payload=KafkaSupervisorReportPayload{dataSource=‘rt-idbox’, topic=‘rt-idbox’, partitions=1, replicas=2, durationSeconds=1800, active=[{id=‘index_kafka_rt-idbox_1257830c5cc7315_dclelgnc’, startTime=2020-01-14T12:17:05.024Z, remainingSeconds=26}, {id=‘index_kafka_rt-idbox_1257830c5cc7315_dknffpaf’, startTime=2020-01-14T12:17:07.505Z, remainingSeconds=28}], publishing=[{id=‘index_kafka_rt-idbox_4ed329169eec831_hcbhclbm’, startTime=2020-01-14T11:46:56.488Z, remainingSeconds=18}, {id=‘index_kafka_rt-idbox_4ed329169eec831_adfklccf’, startTime=2020-01-14T11:46:56.066Z, remainingSeconds=18}], suspended=false, healthy=true, state=RUNNING, detailedState=RUNNING, recentErrors=}}”
2020-01-14 12:47:06.116,“2020-01-14T12:47:06,116 ERROR [KafkaSupervisor-rt-idbox] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - No task in [[index_kafka_rt-idbox_4ed329169eec831_hcbhclbm, index_kafka_rt-idbox_4ed329169eec831_adfklccf]] for taskGroup [0] succeeded before the completion timeout elapsed [PT1800S]!: {class=org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor}”
2020-01-14 12:47:06.116,“2020-01-14T12:47:06,116 INFO [KafkaSupervisor-rt-idbox] org.apache.druid.indexing.overlord.RemoteTaskRunner - Shutdown [index_kafka_rt-idbox_4ed329169eec831_hcbhclbm] because: [No task in pending completion taskGroup[0] succeeded before completion timeout elapsed]”
Kafka Indexing tasks are supposed to finish a task within the completion timeout. If they won’t,The Kafka-supervisor assumes that there are some problems and issue a kill/shutdown signal to the tasks, that’s what seems has happened here.
A running task will normally be in one of two states: reading or publishing. A task will remain in reading state for taskDuration
, at which point it will transition to publishing state. A task will remain in publishing state for as long as it takes to generate segments, push segments to deep storage, and have them be loaded and served by a Historical process (or until completionTimeout
elapses).
The length of time to wait before declaring a publishing task as failed and terminating it. If this is set too low, your tasks may never publish. The publishing clock for a task begins roughly after taskDuration
elapses.
For now, Please increase the completion timeout
to 60 minutes [ i.e PT60M] and see if that helps.
Additionally, I will suggest you to go through below link to fine-tune your kafka ingestion:
https://druid.apache.org/docs/latest/development/extensions-core/kafka-ingestion.html
https://druid.apache.org/docs/latest/development/extensions-core/kafka-ingestion.html#capacity-planning
Thanks and Regards,
Vaibhav