some of the tasks are failing immediately after creation of task with
statusCode : FAILED
duration : -1
which supervisor config properties are responsible for this?
and few tasks end with status as success but the logs say different
2019-11-04T06:29:50,009 INFO [appenderator_merge_0] io.druid.segment.IndexMergerV9 - walked 500,000/1000000 rows in 2,590,636 millis.
2019-11-04T06:30:23,865 INFO [qtp230816477-180] io.druid.indexing.kafka.KafkaIndexTask - Stopping gracefully (status: [PUBLISHING])
2019-11-04T06:30:23,878 WARN [kafka-kerberos-refresh-thread-sub_stream_druid@XXX] org.apache.kafka.common.security.kerberos.KerberosLogin - [Principal=sub_stream_druid@XXX]: TGT renewal thread has been interrupted and will exit.
2019-11-04T06:30:23,879 INFO [task-runner-0-priority-0] io.druid.segment.realtime.appenderator.AppenderatorImpl - Shutting down immediately...
.
.
.
.
2019-11-04T06:30:23,914 INFO [task-runner-0-priority-0] io.druid.indexing.kafka.KafkaIndexTask - **The task was asked to stop before completing**
2019-11-04T06:30:23,914 INFO [task-runner-0-priority-0] io.druid.segment.realtime.appenderator.AppenderatorImpl - Appenderator already closed
2019-11-04T06:30:23,915 INFO [task-runner-0-priority-0] io.druid.segment.realtime.firehose.ServiceAnnouncingChatHandlerProvider - Unregistering chat handler[index_kafka_analytics_events_8_edc931cf3de20c6_bnkhpdli]
2019-11-04T06:30:23,915 WARN [publish-driver] io.druid.indexing.kafka.KafkaIndexTask - Stopping publish thread as we are interrupted, probably we are shutting down
.
.
.
2019-11-04T06:30:23,922 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_kafka_analytics_events_8_edc931cf3de20c6_bnkhpdli] status changed to [SUCCESS].
2019-11-04T06:30:23,932 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
"id" : "index_kafka_analytics_events_8_edc931cf3de20c6_bnkhpdli",
"status" : "SUCCESS",
"duration" : 21470610
}
how to check if segment got published or not in this case?
You can check for publish and segments handoff information in the logs. The logs will also provide you with the interval and Id of the segments.
2019-08-22T01:40:01,219 INFO [publish-0] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Published segments
2019-08-22T01:40:59,457 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Segment Handoff complete for dataSource
Check for the above information in the logs to verify if the segments got published.
java.lang.IllegalArgumentException: fromIndex(0) > toIndex(-1)
at java.util.ArrayList.subListRangeCheck(ArrayList.java:1014) ~[?:1.8.0_222]
at java.util.ArrayList.subList(ArrayList.java:1004) ~[?:1.8.0_222]
at io.druid.segment.realtime.appenderator.AppenderatorImpl.persistAll(AppenderatorImpl.java:408) ~[druid-server-0.12.1.3.1.0.0-78.jar:0.12.1.3.1.0.0-78]
at io.druid.segment.realtime.appenderator.AppenderatorImpl.push(AppenderatorImpl.java:518) ~[druid-server-0.12.1.3.1.0.0-78.jar:0.12.1.3.1.0.0-78]
at io.druid.segment.realtime.appenderator.BaseAppenderatorDriver.pushInBackground(BaseAppenderatorDriver.java:345) ~[druid-server-0.12.1.3.1.0.0-78.jar:0.12.1.3.1.0.0-78]
at io.druid.segment.realtime.appenderator.StreamAppenderatorDriver.publish(StreamAppenderatorDriver.java:264) ~[druid-server-0.12.1.3.1.0.0-78.jar:0.12.1.3.1.0.0-78]
at io.druid.indexing.kafka.KafkaIndexTask.lambda$createAndStartPublishExecutor$1(KafkaIndexTask.java:364) ~[?:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_222]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_222]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_222]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_222]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222]
It looks like persists to HDFS are taking too long from Druid. Can you check resources on the Hadoop cluster? It could also point to another problem - like slow network connection.