Transaction failure publishing segments, aborting

Hi ,

Unable to find root cause of index_kafka task failing.

Setup Details :

b. 1 Zookeeper, 3 middile manage and historical,1 broker

a. Kafak in cluster with 2 machine

c. Hadoop Cluster with 2 machine.

Version Using :

Druid : 11.0

Hadoop : 2.7.3

Almost 80% task getting success but 20% failing, not able to find any root cause .

2018-03-16T05:34:11,649 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[KafkaIndexTask{id=index_kafka_ERS_c5875771098b834_mekalgea, type=index_kafka, dataSource=ERS}]

io.druid.java.util.common.ISE: Transaction failure publishing segments, aborting

at io.druid.indexing.kafka.KafkaIndexTask.run(KafkaIndexTask.java:581) ~[?:?]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid-indexing-service-0.11.0-iap2.jar:0.11.0-iap2]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid-indexing-service-0.11.0-iap2.jar:0.11.0-iap2]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_161]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_161]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_161]

at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]

2018-03-16T05:34:11,653 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_kafka_ERS_c5875771098b834_mekalgea] status changed to [FAILED].

2018-03-16T05:34:11,654 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {

“id” : “index_kafka_ERS_c5875771098b834_mekalgea”,

“status” : “FAILED”,

“duration” : 3863156

}

Regards,

Sudhanshu Lenka

We have exactly the same problem. Can you please share how you solved it?

在 2018年3月16日星期五 UTC+8下午2:32:03,Sudhanshu Lenka写道:

Hi Sudhanshu,
Can you paste your kafka ingestion spec?

I suspect you might have to tweak taskDuration and completionTimeout ?

How much data you are trying to ingest via this kafka ingestion job?

Something to give a thought - looks like you are also bit older druid version(0.11).

May be time to upgrade to at least druid-0.15 if not druid-0.16.

Thanks,

–siva