How does druid handle queries on archived data(tiered storage)?

Hey Folks,

We have a requirement where we need to store 1 week of data in the “hot” tier, upto 1 month of data in the “cold” tier. For data that is older than 1 month, we don’t want to retain that in historical nodes. so I understand that we can achieve that by configuring data retention rules(dropByPeriod).

I understand that the data that is dropped using retention rules will still remain in deep storage. We want to retain that data in deep storage for 7 years and drop from deep storage after 7 years.

Question: E very once in a while the customer may issue a query on data older than 1 month, How will druid handle that query? will druid fetch the relevant data required for the query from deep storage and delete it after serving the query? or will druid just say that no data is available for that range? How do you handle such scenarios? we want the data older then 1 month to still remain queryable through druid.

Looking forward to hearing your experiences.



Hi Mahtab,
Welcome to the Apache Druid community.
Druid does not currently have a historical tier that would query directly from deep storage. There is an open issue in the github repo about this here:

Feel free to add your comments there and upvote it.