Druid 0.11.0: 'Zombie' shard after reindexing with "ingestSegment" firehose

Hi,

we are reindexing data produced by Kafka Indexing Service ( => lots of shards) with “ingestSegment” firehose to get fewer and bigger segments/files.
Reindexing happens in the same datasource, so we are overwriting the small files with a bigger one (150 to 1). From time to time I run a “kill” task to
remove all the unused segments from metadata storage and HDFS.
All this works fine, but today I recognized that there is a shard (#52) which is ‘overwritten’ but seems not to be removed from our cluster. (I see it in the web GUI at :8081, see screenshot)

First I checked coordinator log and found the following:

2018-03-15T08:17:05,272 WARN [ZkCoordinator-Exec–0] io.druid.server.coordination.ZkCoordinator - Unable to delete segmentInfoCacheFile[/druid_segment_cache/info_dir/my_datasource_supervisor_2018-03-13T00:00:00.000Z_2018-03-14T00:00:00.000Z_2018-03-13T00:00:00.372Z_52]
2018-03-15T08:17:05,630 INFO [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - New request[DROP: my_datasource_supervisor_2018-03-13T00:00:00.000Z_2018-03-14T00:00:00.000Z_2018-03-13T00:00:00.372Z_52] with zNode[/druid/loadQueue/hadoop130:8083/my_datasource_supervisor_2018-03-13T00:00:00.000Z_2018-03-14T00:00:00.000Z_2018-03-13T00:00:00.372Z_52].
2018-03-15T08:17:05,630 WARN [ZkCoordinator-0] io.druid.server.coordination.BatchDataSegmentAnnouncer - No path to unannounce segment[my_datasource_supervisor_2018-03-13T00:00:00.000Z_2018-03-14T00:00:00.000Z_2018-03-13T00:00:00.372Z_52]
2018-03-15T08:17:05,631 INFO [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - Completely removing [my_datasource_supervisor_2018-03-13T00:00:00.000Z_2018-03-14T00:00:00.000Z_2018-03-13T00:00:00.372Z_52] in [30,000] millis
2018-03-15T08:17:05,632 INFO [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - Completed request [DROP: my_datasource_supervisor_2018-03-13T00:00:00.000Z_2018-03-14T00:00:00.000Z_2018-03-13T00:00:00.372Z_52]
2018-03-15T08:17:05,632 INFO [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - zNode[/druid/loadQueue/hadoop130:8083/my_datasource_supervisor_2018-03-13T00:00:00.000Z_2018-03-14T00:00:00.000Z_2018-03-13T00:00:00.372Z_52] was removed

But…
) the file does not exist in HDFS
) the file does not exist in the MetaData storage
) the file does not exist in our segment-cache/info_dir
…it only exists in the web GUI and in the log file

I thought that this may be a problem/error with coordinator, so I restarted both we have successively. Unfortunately, the problem was not solved, but the log looks different.

2018-03-15T10:00:00,162 INFO [Coordinator-Exec–0] io.druid.server.coordinator.LoadQueuePeon - Asking server peon[/druid/loadQueue/hadoop130:8083] to drop segment[my_datasource_supervisor_2018-03-13T00:00:00.000Z_2018-03-14T00:00:00.000Z_2018-03-13T00:00:00.372Z_52]
2018-03-15T10:00:00,186 INFO [Master-PeonExec–0] io.druid.server.coordinator.LoadQueuePeon - Server[/druid/loadQueue/hadoop130:8083] dropping [my_datasource_supervisor_2018-03-13T00:00:00.000Z_2018-03-14T00:00:00.000Z_2018-03-13T00:00:00.372Z_52]
2018-03-15T10:00:00,186 INFO [Master-PeonExec–0] io.druid.server.coordinator.LoadQueuePeon - Server[/druid/loadQueue/hadoop130:8083] processing segment[my_datasource_supervisor_2018-03-13T00:00:00.000Z_2018-03-14T00:00:00.000Z_2018-03-13T00:00:00.372Z_52]
2018-03-15T10:00:00,190 INFO [main-EventThread] io.druid.server.coordinator.LoadQueuePeon - Server[/druid/loadQueue/hadoop130:8083] done processing [/druid/loadQueue/hadoop130:8083/my_datasource_supervisor_2018-03-13T00:00:00.000Z_2018-03-14T00:00:00.000Z_2018-03-13T00:00:00.372Z_52]

…which occurs periodically.

What caused the problem and how can we prevent it?
Is it caused by coordinator, zookeeper, peon, …?
How do we get rid of this ‘zombie’ shard?

Thanks, Alex