When will realtime node delete local segments?

Hi group,

Recently I found our druid realtime node(standalone) forgot to delete some local segments which have been pushed to deep storage. After quick scanning codes, it seems realtime node listens to ZNodes and will hold local segments until got notified that some historical nodes have load the exactly segment.

What I hit seems my realtime node was in big GC trouble, then lost ZK connection for a while, then missed the ZNodes notification, then keeps the local segments forever. Correct me if I’m wrong please, thanks!

PS: I’m using druid 0.8.1, 3 nodes ZK cluster.

Does restarting the realtime node work? That should get it to re-scan the announcements.

Hi Gian,

It seems restarting dose not work, maybe partitally dose. If segments generated by realtime indexing have more shards than by batch indexing, says 5 vs. 3, then when batch indexing for the same interval finished and get loaded to historical nodes, realtime node will have no chance to know shard 3,4 is ever loaded?

在 2015年10月23日星期五 UTC+8上午4:43:05,Gian Merlino写道:

Hey Zhao,

I think this is a bug. I filed it here: https://github.com/druid-io/druid/issues/1851

I think you can work around it by stopping the realtime node, removing that segment’s sink directory on disk (in your basePersistDirectory), and then starting the realtime node back up.

Hi Gian,

Thanks for reply.

Removing the basePersistDirectory works.

在 2015年10月24日星期六 UTC+8上午6:18:03,Gian Merlino写道: