Remove of old segments from deep storage

Hi,

when I override exiting segments with new version, for example due to schema change of daily batch job to update data, I still see that the old segments are not deleted from deep storage.

How can I completely drop those old segments from deep storage ? can that be dine automatically or I have to do that manually ?

Thanks,

see “Kill Task” here - http://druid.io/docs/latest/ingestion/tasks.html (Overshadowed segments are disabled in metadata store by Druid itself)

from the kill command docs:

Kill tasks delete all information about a segment and removes it from deep storage. Killable segments must be disabled (used==0) in the Druid segment table. The available grammar is:

{
    "type": "kill",
    "id": <task_id>,
    "dataSource": <task_datasource>,
    "interval" : <all_segments_in_this_interval_will_die!>
}

The kill command accepts interval, so even if I provide the kill command with an interval that is currently used (latest segment version), only the disabled segments for that interval would be deleted from the deep storage (meaning old versions) ?

Does the druid.coordinator.kill.on does that should do the described above automatically ?

we have seted the following:

druid.coordinator.kill.on=true

druid.coordinator.kill.period=PT1H

druid.coordinator.kill.durationToRetain=PT0S

druid.coordinator.kill.maxSegments=1000

but still seems that after we run for kafka index task (new kafka indexer) a re-index job where the source is druid segments (created by the kafka supervisor), the old segments are not deleted.

Thanks,

Yes only the disabled segments gets deleted. Auto kill feature
does not defend against user errors - for example lets say you accidentally disabled a datasource and auto kill thread started running at that point then it will go and delete everything, so be careful of what you set as druid.coordinator.kill.durationToRetain and generally auto kill is not recommended for production systems.

Auto
kill feature internally submits kill tasks, so check if tasks are being
submitted and can look at the task logs to see what is wrong and why it
is not killing segments that you are expecting to get killed. Generally, it is good to keep kill duration/interval such that it does not overlap with currently running indexing task intervals as tasks acquire locks for interval and thus kill task may not get the lock if there is any currently running indexing task with same interval and vice
versa.