io.druid.java.util.common.ISE: Transaction failure publishing segments, aborting

Hi Team,

We have got 2 FAILED tasks recently for KAFKA Indexing service with exception “io.druid.java.util.common.ISE: Transaction failure publishing segments, aborting” for both of them. We tried to check overlord logs but we did not get any exception there to explain this failure.

We have not mentioned any completion time (`completionTimeout) or task duration (``taskDuration) so by default the task duration is 60 min and completion timeout is 30 mins. As per this calculation - both of these tasks failed before 90 mins (60+30).

index_kafka_DSLAM_6572efbbc2747ad_hpjglfkm (duration of the task : 79 mins)
start time : 2017-07-14T01:21:50,698
end time :2017-07-14T02:40:38,157
`
2017-07-14T00:38:49,704 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running
task[KafkaIndexTask{id=index_kafka_DSLAM_fe0ce3ca206fc86_gpmgblmn, type=index_kafka, dataSource=DSLAM}]
io.druid.java.util.common.ISE: Transaction failure publishing segments, aborting
at io.druid.indexing.kafka.KafkaIndexTask.run(KafkaIndexTask.java:517) ~[?:?]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid
-indexing-service-0.10.0.jar:0.10.0]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid
-indexing-service-0.10.0.jar:0.10.0]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_91]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_91]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_91]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_91]

Regards,
Arpan Khagram
+91 8308993200

Hi Druid Team - can you please let me know because this error is coming everyday for few of the tasks and there is no reason i can find anywhere (checked overlord logs, middle manager task logs ).

Also KAFKA Indexing tasks recover and start listening from earlier points but this is irritating as tasks are failing without any known reason.

Regards,

Arpan Khagram

+91 8308993200

The information given here is limited to actually figure out what might be wrong. Do you see any WARN logs in overlord or task log. Do you see any info log at overlord saying “Not updating metadata, existing state is not the expected start state.”

the task payload partionoffset is no equals with the value in mysql db.

在 2017年7月14日星期五 UTC+8下午1:42:13,Arpan Khagram写道:

yes that is a most probable reason why this might happen, state at overlord gets out of sync with datasource metadata in db. Although this should only happen when either someone manually edits the payload in the metadata store or there is a bug. The way to resolve is to reset the supervisor.

Hi all, to me it looks to be related to https://github.com/druid-io/druid/issues/3600 and overlord logs suggests the same.

We already tried resetting supervisor but it did not help. Overload logs are not suggesting any issues- its just logging that the task has FAILED.

Regards,

Arpan Khagram

I also see similar issue. Any idea how to resolve this?

2017-08-08T09:15:23,071 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[KafkaIndexTask{id=index_kafka_Cube-msg-ABC_7bcc8190970f320_empolehl, type=index_kafka, dataSource=TEST-msg-fei1}]

io.druid.java.util.common.ISE: Transaction failure publishing segments, aborting

at io.druid.indexing.kafka.KafkaIndexTask.run(KafkaIndexTask.java:517) ~[?:?]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid-indexing-service-0.10.0.jar:0.10.0]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid-indexing-service-0.10.0.jar:0.10.0]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_92]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_92]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_92]

at java.lang.Thread.run(Thread.java:745) [?:1.8.0_92]

2017-08-08T09:15:23,076 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_kafka_TEST-msg-ABC_7bcc8190970f320_empolehl] status changed to [FAILED].

Hello,

reporting the same problem (Druid 0.10.1),

7 tasks failed with this ERROR within 15sec period:

2017-09-10T17:09:25,886 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[KafkaIndexTask{id=index_kafka_pro
d_dataSource_1_events_f04b3a4a7aee511_lieonnmb, type=index_kafka, dataSource=prod_dataSource_1_events}]
io.druid.java.util.common.ISE: Transaction failure publishing segments, aborting

2017-09-10T17:09:27,302 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[KafkaIndexTask{id=index_kafka_pro
d_dataSource_1_events_dc8185cd1637933_dadcakie, type=index_kafka, dataSource=prod_dataSource_1_events}]
io.druid.java.util.common.ISE: Transaction failure publishing segments, aborting

2017-09-10T17:09:31,947 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[KafkaIndexTask{id=index_kafka_pro
d_dataSource_1_events_43db9a6973f3fbd_mhfhgfpm, type=index_kafka, dataSource=prod_dataSource_1_events}]
io.druid.java.util.common.ISE: Transaction failure publishing segments, aborting

2017-09-10T17:09:34,506 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[KafkaIndexTask{id=index_kafka_pro
d_dataSource_1_events_1bc3508aa30c8a5_mbjollmg, type=index_kafka, dataSource=prod_dataSource_1_events}]
io.druid.java.util.common.ISE: Transaction failure publishing segments, aborting

2017-09-10T17:09:36,081 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[KafkaIndexTask{id=index_kafka_pro
d_dataSource_1_events_35186bcfb37a63a_dhcemlbo, type=index_kafka, dataSource=prod_dataSource_1_events}]
io.druid.java.util.common.ISE: Transaction failure publishing segments, aborting

2017-09-10T17:09:40,234 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[KafkaIndexTask{id=index_kafka_pro
d_dataSource_1_events_ab94c121d583257_ikjbibgf, type=index_kafka, dataSource=prod_dataSource_1_events}]
io.druid.java.util.common.ISE: Transaction failure publishing segments, aborting

2017-09-10T17:09:40,619 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[KafkaIndexTask{id=index_kafka_pro
d_dataSource_1_events_6fdbd1b7dd0da35_nikgmkph, type=index_kafka, dataSource=prod_dataSource_1_events}]
io.druid.java.util.common.ISE: Transaction failure publishing segments, aborting

``

**dataSource_1** ``**task success ratio:** ``**94.77 %** **withOverall Overlord task success ratio: 97.4 %**Regards,
Jan

Can you see any pattern - Is it happening after overlord process is restarted or middleManagers are restarted or anything else. Would it be possible for you or anyone else in this thread to share full task logs of the failed task (remember task log will contain datasource, dimensions, metrics information and runtime properties etc. that may include any passwords written in runtime properties file) and also the relevant overlord log around the time the task failed.