How to delete Unused Segments from Metadata Storage?

Hi all,

I have a single node druid instance up and running with metadata storage configured as mysql in our production environment. Every day a compaction task is run based on day granularity.

Running a compaction task automatically marks old segments as unused in metadata storage.

My question is, how can I delete those unused segments from metadata ? Should I run a kill task for a given interval ?

If yes, would it also also delete the used segments because there are both used and unused segments for that particular interval ??

If no, can I directly update the druid database in metadata by deleting the unused segments ?

Example:

mysql [druid]> delete from druid_segments where datasource = ‘some_datasource’ AND start like ‘2020-02%’ AND used = 0;


Please enlighten me with your sheer knowledge and experiences.

TIA,

Hi Zaid,

The short version is that a kill task will do what you want: https://druid.apache.org/docs/latest/ingestion/data-management.html#delete
When the kill task runs, it deletes unused segments files (that fit the task parameters) from storage. Then, once it’s done, it removes them from the metastore DB.

You probably do NOT want to manually delete segments from the metastore DB, as that would “orphan” the underlying segment files. Once they’re no longer listed in the metastore, you won’t be able to clean them up with kill tasks.

Thanks,
Max

This article also explains it evry well:

https://medium.com/nmc-techblog/data-retention-and-deletion-in-apache-druid-74ffd12398a8

Hey Max,

Thank you for this precise answer! I’ll definitely follow your way!

Cheers!!

:slight_smile:

Hi Edwin,

Thank you for sharing this article :slight_smile: