[druid-user] druid not automatically removing segments

Hello everyone,

We have a deployment where we have a retention time of 14 days and druid configured to remove unused segments from deep storage automatically every day, however the removal process doesn’t seem to do anything. We don’t see any kill tasks in the tasklogs directories and no errors or exceptions in the logs either (or probably we don’t know how recognize it).

It worked for some time but after a restart it seems to have stopped.

Has anyone encountered this situation or a suggestion on how to investigate?

Thanks in advance,

Gijs van Enckevort

Could you check whether the coordinator config includes the necessary “druid.coordinator.kill” settings?

Here is a snippet from the settings, as far as I can tell, it includes everything we need (but I guess that is part of the question).

druid.coordinator.period.indexingPeriod=PT24H
druid.coordinator.kill.on=true << the requested parameter
druid.coordinator.kill.period=PT25H << we would have put 24H (or 1D) but the docs read “Value must be greater than druid.coordinator.period.indexingPeriod” which (as shown above) is 24H
druid.coordinator.kill.durationToRetain=P14D << this matches with the retention time configured in the rules as well. We have verified that segments older than 14 days are marked as “unused” in the metadata database as expected
druid.coordinator.kill.maxSegments=1000 << this was a guess, we probably need way less than this, but we have some “backlog” to remove
druid.coordinator.kill.killAllDataSources=false << because we’re using the whitelist below
druid.coordinator.kill.killDataSourceWhitelist=[“topic-1”,“topic-2”] << I’ve replaced the actual topic names with placeholders “topic-1” and “topic-2”. We only have two topics

We have tried to submit a manual kill task and it works as expected.

Thanks,

Gijs van Enckevort

Hmmmmm I don’t know… I note that your druid.coordinator.period.indexingPeriod is much longer than the default (which is like 30 minutes). I wonder if it being over 24 hours is causing some issue…?