Coordinator error: io.druid.java.util.common.ISE:[zkPath] was never removed! Failing this operation!

Hi Team,

We are facing a recurrent issue on our coordinator node (druid 0.11) during segment balancing - do you have any suggestions, please?

From logs, it looks like the loading of the segment on the historical node take too much time (around 16 seconds) and the coordinator throw the error after 10 seconds :

coordinator log

2018-04-03T04:21:57,359 INFO [Coordinator-Exec–0] io.druid.server.coordinator.helper.DruidCoordinatorBalancer - Moving [DRUID_SEGMENT] from [HISTORICAL1] to [HISTORICAL2]

2018-04-03T04:21:57,359 INFO [Coordinator-Exec–0] io.druid.server.coordinator.LoadQueuePeon - Asking server peon[/druid/PATH/loadQueue/HISTORICAL2] to load segment[DRUID_SEGMENT]

2018-04-03T04:22:31,419 INFO [Master-PeonExec–0] io.druid.server.coordinator.LoadQueuePeon - Server[/druid/PATH/loadQueue/HISTORICAL2] loading [DRUID_SEGMENT]

2018-04-03T04:22:31,419 INFO [Master-PeonExec–0] io.druid.server.coordinator.LoadQueuePeon - Server[/druid/PATH/loadQueue/HISTORICAL2] processing segment[DRUID_SEGMENT]

2018-04-03T04:22:41,423 ERROR [Master-PeonExec–0] io.druid.server.coordinator.LoadQueuePeon - Server[/druid/PATH/loadQueue/HISTORICAL2], throwable caught when submitting [SegmentChangeRequestLoad{segment=DataSegment{size=819500829, shardSpec=NumberedShardSpec{partitionNum=3, partitions=9}, metrics=[METRICS], dimensions=[DIMENSIONS], version=‘2018-01-29T12:58:05.889Z’, loadSpec={type=s3_zip, bucket=S3_BUCKET, key=S3_PATH/index.zip, S3Schema=s3n}, interval=2018-01-28T02:00:00.000Z/2018-01-28T03:00:00.000Z, dataSource=‘DATASOURCE’, binaryVersion=‘9’}}].

io.druid.java.util.common.ISE: /druid/PATH/loadQueue/HISTORICAL2/DRUID_SEGMENT was never removed! Failing this operation!

at io.druid.server.coordinator.LoadQueuePeon$1.run(LoadQueuePeon.java:259) [druid-server-0.11.0-iap7.jar:0.11.0-iap7]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_161]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_161]

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_161]

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_161]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_161]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_161]

at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]

2018-04-03T04:22:47,845 INFO [Coordinator-Exec–0] io.druid.server.coordinator.LoadQueuePeon - Asking server peon[/druid/PATH/loadQueue/HISTORICAL2] to drop segment[DRUID_SEGMENT]

2018-04-03T04:22:48,113 INFO [Master-PeonExec–0] io.druid.server.coordinator.LoadQueuePeon - Server[/druid/PATH/loadQueue/HISTORICAL2] dropping [DRUID_SEGMENT]

2018-04-03T04:22:48,113 INFO [Master-PeonExec–0] io.druid.server.coordinator.LoadQueuePeon - Server[/druid/PATH/loadQueue/HISTORICAL2] processing segment[DRUID_SEGMENT]

2018-04-03T04:22:48,129 INFO [main-EventThread] io.druid.server.coordinator.LoadQueuePeon - Server[/druid/PATH/loadQueue/HISTORICAL2] done processing [/druid/PATH/loadQueue/HISTORICAL2/DRUID_SEGMENT]

``

historical (destination) log

2018-04-03T04:22:31,423 INFO [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - New request[LOAD: DRUID_SEGMENT] with zNode[/druid/PATH/loadQueue/HISTORICAL2/DRUID_SEGMENT].

2018-04-03T04:22:31,424 INFO [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - Loading segment DRUID_SEGMENT

2018-04-03T04:22:47,548 INFO [ZkCoordinator-0] io.druid.server.coordination.BatchDataSegmentAnnouncer - Announcing segment[DRUID_SEGMENT] at existing path[/druid/PATH/segments/HISTORICAL2/HISTORICAL2_historical__replica_tier_2018-03-15T12:10:14.588Z_96ed0677ea904964aec07141d96f2e2314]

2018-04-03T04:22:47,555 INFO [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - Completed request [LOAD: DRUID_SEGMENT]

2018-04-03T04:22:47,555 INFO [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - zNode[/druid/PATH/loadQueue/HISTORICAL2/DRUID_SEGMENT] was removed

2018-04-03T04:22:48,118 INFO [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - New request[DROP: DRUID_SEGMENT] with zNode[/druid/PATH/loadQueue/HISTORICAL2/DRUID_SEGMENT].

2018-04-03T04:22:48,123 INFO [ZkCoordinator-0] io.druid.server.coordination.BatchDataSegmentAnnouncer - Unannouncing segment[DRUID_SEGMENT] at path[/druid/PATH/segments/HISTORICAL2/HISTORICAL2_historical__replica_tier_2018-03-15T12:10:14.588Z_96ed0677ea904964aec07141d96f2e2314]

2018-04-03T04:22:48,126 INFO [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - Completely removing [DRUID_SEGMENT] in [30,000] millis

2018-04-03T04:22:48,129 INFO [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - Completed request [DROP: DRUID_SEGMENT]

2018-04-03T04:22:48,129 INFO [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - zNode[/druid/PATH/loadQueue/HISTORICAL2/DRUID_SEGMENT] was removed

``

Thank you,

Dan

I am facing the same issue, any update to fix this?

Chitra,

Check this thread - https://groups.google.com/forum/#!topic/druid-user/oSCPnpKkLwQ . Possible issue with ingest spec. The user in the thread marked their problematic segments as unused.

Rommel Garcia