Kafka ingestion tasks failing

Hello,

some of the tasks are failing immediately after creation of task with

statusCode : FAILED

duration : -1

which supervisor config properties are responsible for this?

and few tasks end with status as success but the logs say different

2019-11-04T06:29:50,009 INFO [appenderator_merge_0] io.druid.segment.IndexMergerV9 - walked 500,000/1000000 rows in 2,590,636 millis.
    2019-11-04T06:30:23,865 INFO [qtp230816477-180] io.druid.indexing.kafka.KafkaIndexTask - Stopping gracefully (status: [PUBLISHING])
    2019-11-04T06:30:23,878 WARN [kafka-kerberos-refresh-thread-sub_stream_druid@XXX] org.apache.kafka.common.security.kerberos.KerberosLogin - [Principal=sub_stream_druid@XXX]: TGT renewal thread has been interrupted and will exit.
    2019-11-04T06:30:23,879 INFO [task-runner-0-priority-0] io.druid.segment.realtime.appenderator.AppenderatorImpl - Shutting down immediately...
    .
    .
    .
    .
    2019-11-04T06:30:23,914 INFO [task-runner-0-priority-0] io.druid.indexing.kafka.KafkaIndexTask - **The task was asked to stop before completing**

    2019-11-04T06:30:23,914 INFO [task-runner-0-priority-0] io.druid.segment.realtime.appenderator.AppenderatorImpl - Appenderator already closed
    2019-11-04T06:30:23,915 INFO [task-runner-0-priority-0] io.druid.segment.realtime.firehose.ServiceAnnouncingChatHandlerProvider - Unregistering chat handler[index_kafka_analytics_events_8_edc931cf3de20c6_bnkhpdli]
    2019-11-04T06:30:23,915 WARN [publish-driver] io.druid.indexing.kafka.KafkaIndexTask - Stopping publish thread as we are interrupted, probably we are shutting down
    .
    .
    .

    2019-11-04T06:30:23,922 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_kafka_analytics_events_8_edc931cf3de20c6_bnkhpdli] status changed to [SUCCESS].
    2019-11-04T06:30:23,932 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
    "id" : "index_kafka_analytics_events_8_edc931cf3de20c6_bnkhpdli",
    "status" : "SUCCESS",
    "duration" : 21470610
    }

how to check if segment got published or not in this case?

Thanks,

kkl.

Hi Krishna,

You can check for publish and segments handoff information in the logs. The logs will also provide you with the interval and Id of the segments.

2019-08-22T01:40:01,219 INFO [publish-0] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Published segments
2019-08-22T01:40:59,457 INFO [coordinator_handoff_scheduled_0] org.apache.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Segment Handoff complete for dataSource

Check for the above information in the logs to verify if the segments got published.

Thanks,

Hemanth

Hi Kkl,

Could you check MiddleManager and Overlord log for the task under consideration? You will get more details on what has happened there .

Please also share your spuerviserSpec.

Thanks and Regards,

Vaibhav

Thanks for the reply.

druid version - 0.12

supervisor spec is as follows

{
“type”: “kafka”,
“dataSchema”: {
“dataSource”: “events”,
“parser” : {
“type” : “string”,
“parseSpec” : {
“format” : “json”,
“flattenSpec”:{
“useFieldDiscovery”:true,
“fields”:
},
“dimensionsSpec” : {
“dimensions” :
},
“timestampSpec”: {
“column”: “time”,
“format”: “millis”
}
}
},
“metricsSpec” : ,
“granularitySpec”: {
“type”: “uniform”,
“segmentGranularity”: “day”,
“queryGranularity”: “NONE”,
“rollup”: “false”
}
},
“tuningConfig”: {
“type”: “kafka”,
“reportParseExceptions”: false
},
“ioConfig”: {
“topic”: “analytics_events”,
“replicas”: 1,
“taskCount”: 30,
“taskDuration”: “PT30M”,
“completionTimeout”: “PT120M”,
“consumerProperties”: {
“bootstrap.servers”: “”,
“security.protocol”: “”,
sasl.kerberos.service.name”: “kafka”,
group.id”: “druid_6”,
“key.deserializer”: “org.apache.kafka.common.serialization.StringDeserializer”,
“value.deserializer”: “org.apache.kafka.common.serialization.StringDeserializer”
}
}
}

please suggest any improvements for this…

and why does this kind of error occur

java.lang.IllegalArgumentException: fromIndex(0) > toIndex(-1)
	at java.util.ArrayList.subListRangeCheck(ArrayList.java:1014) ~[?:1.8.0_222]
	at java.util.ArrayList.subList(ArrayList.java:1004) ~[?:1.8.0_222]
	at io.druid.segment.realtime.appenderator.AppenderatorImpl.persistAll(AppenderatorImpl.java:408) ~[druid-server-0.12.1.3.1.0.0-78.jar:0.12.1.3.1.0.0-78]
	at io.druid.segment.realtime.appenderator.AppenderatorImpl.push(AppenderatorImpl.java:518) ~[druid-server-0.12.1.3.1.0.0-78.jar:0.12.1.3.1.0.0-78]
	at io.druid.segment.realtime.appenderator.BaseAppenderatorDriver.pushInBackground(BaseAppenderatorDriver.java:345) ~[druid-server-0.12.1.3.1.0.0-78.jar:0.12.1.3.1.0.0-78]
	at io.druid.segment.realtime.appenderator.StreamAppenderatorDriver.publish(StreamAppenderatorDriver.java:264) ~[druid-server-0.12.1.3.1.0.0-78.jar:0.12.1.3.1.0.0-78]
	at io.druid.indexing.kafka.KafkaIndexTask.lambda$createAndStartPublishExecutor$1(KafkaIndexTask.java:364) ~[?:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_222]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_222]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_222]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_222]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222]

Hello,

Task logs of a kafka ingestion in reading state.

I see these logs multiple times before taskDuration elapses.

2019-11-07T04:33:13,163 INFO [analytics_events_8-incremental-persist] io.druid.java.util.common.io.smoosh.FileSmoosher - Created smoosh file [...] of size [18995216] bytes.
2019-11-07T04:33:13,743 INFO [analytics_events_8-incremental-persist] io.druid.segment.realtime.appenderator.AppenderatorImpl - Committing metadata[..] for sinks[..].
2019-11-07T04:33:13,757 INFO [appenderator_persist_0] io.druid.indexing.kafka.KafkaIndexTask - Persist completed with metadata [AppenderatorDriverMetadata{..}]
2019-11-07T04:33:13,757 INFO [analytics_events_8-incremental-persist] io.druid.segment.realtime.appenderator.AppenderatorImpl - Segment[..], Hydrant[..] already swapped. Ignoring request to persist.
2019-11-07T04:33:13,757 INFO [analytics_events_8-incremental-persist] io.druid.segment.realtime.appenderator.AppenderatorImpl - Segment[..], persisting Hydrant[FireHydrant{index=io.druid.segment.incremental.OnheapIncrementalIndex@151afe15, queryable=[..], count=1}]
2019-11-07T04:33:13,757 WARN [task-runner-0-priority-0] io.druid.segment.realtime.appenderator.AppenderatorImpl - **Ingestion was throttled for [10,352] millis because persists were pending.**
2019-11-07T04:33:13,757 INFO [analytics_events_8-incremental-persist] io.druid.segment.IndexMergerV9 - Starting persist for interval[2019-11-04T00:00:00.000Z/2019-11-05T00:00:00.000Z], rows[75,000]
....
....
2019-11-07T04:33:19,421 INFO [task-runner-0-priority-0] io.druid.segment.realtime.appenderator.AppenderatorImpl - Hydrant[FireHydrant{index=io.druid.segment.incremental.OnheapIncrementalIndex@151afe15, queryable=[..], count=1}] **hasn't persisted yet, persisting.** Segment[analytics_events_8_2019-11-04T00:00:00.000Z_2019-11-05T00:00:00.000Z_2019-11-04T00:00:00.064Z_1862]
2019-11-07T04:33:19,422 INFO [task-runner-0-priority-0] io.druid.segment.realtime.appenderator.AppenderatorImpl - Submitting persist runnable for dataSource[analytics_events_8]

does this mean indexes are not persisted and no segments can be handed off

How to avoid this case?

Hi all,

Any updates on this…

Had anyone faced this issue, how can this be solved?

What are you using for deep storage? Is this a new setup?

Eric Graham

Solutions Engineer -** **Imply

**cell: **303-589-4581

email: eric.graham@imply.io

www.imply.io

Hi,

setup is hortonworks druid and so deep storage is hdfs.

It looks like persists to HDFS are taking too long from Druid. Can you check resources on the Hadoop cluster? It could also point to another problem - like slow network connection.

Eric Graham

Solutions Engineer -** **Imply

**cell: **303-589-4581

email: eric.graham@imply.io

www.imply.io