I reindex my data into the same datasource(use appendToExist = false) schedully. Sometimes I find that the old version of segments still exist and new segments are already available there. It’s not usual, maybe happen once each day. I am really confused why the old segments sometimes are not be dropped and cause the waste of disk to server these segments(also queries direct to these old segments).
Out of curiosity, how are you finding these? I don’t imagine it’s related, but I came across this post and got to wondering if you’re searching for these old versions of segments or finding them some other way.
Related: once you’ve found them, how are you dropping them?
For changing the query granularity from hour to day, I reindex the data. After reindexing task finished, old versioned data and new versioned data all exist and are marked as used. Also queries ignore the new versioned data.
I can’t drop them cuz if I drop them by internal, the new versioned segments are dropped. Its trivial to drop the old data by segment id one by one.
This is not expected behavior. I just tried this myself and saw that Druid only ever kept one active segment after the reindex no matter how many times I tried. I also did test with changing the granularity.
What version of druid are you using? Maybe you could follow the task/coordinator logs and check what’s happening behind the scenes.
I found that one sub task was killed by monitor and one partition of that interval missing