Metadata storage is full

I am using postgresql for my metadata storage and the disk is currently full. It is about 100 gigs in size. What can I do in this situation? My expectation is that druid would would still operate in this allotted disk space and purge “old” metadata out as needed. I could not find much documentation on my options. Any tips or best practices would greatly be appreciated.

Best Regards,

Hi Sratsamy,

Druid mostly doesn’t purge old metadata, although you can do it yourself. And there are some features in the pipeline to do better here, for example https://github.com/druid-io/druid/pull/5149 coming in 0.12.0. I would suggest checking which table is biggest (count star maybe) and then purging records you don’t need as appropriate. Some examples of things that should be purgeable,

  1. Any druid_segments that have used = 0 and which you don’t want to be able to restore later

  2. Any druid_pendingSegments with created_date older than currently running tasks [this one is being automated by 5149]

  3. Any druid_tasks that have status SUCCESS or FAILED and are old enough that you don’t care about them anymore

Hi Gian,

Thank you for your reply. Thank you also for the suggestions. I definitely don’t mind creating some scripts to automate this but is there a guide or best practice on how to plan for metadata storage. If it doesn’t purge data, then it will always continue to grow. It sounds like in order to make my cluster operational again, I must grow the space for metadata storage. How can I prevent or plan for this in the future? Is there a ratio or formula I can use to determine how much metadata storage I need compared to my raw data storage? I obviously undersized my metadata storage. How do I determine the appropriate size?

I wasn’t able to find much information about how the metadata is built, but I am assuming it is built when the raw data is actually stored into deep storage. Once, I purge the metadata data, I would no longer be able to query that segment of data anymore. The only way to get that data back would be to re-index or re-send the data back into druid. Is this correct?

Thanks again!

Best Regards,