We have a requirement where we need to store 1 week of data in the “hot” tier, upto 1 month of data in the “cold” tier. For data that is older than 1 month, we don’t want to retain that in historical nodes. so I understand that we can achieve that by configuring data retention rules(dropByPeriod).
I understand that the data that is dropped using retention rules will still remain in deep storage. We want to retain that data in deep storage for 7 years and drop from deep storage after 7 years.
Question: E very once in a while the customer may issue a query on data older than 1 month, How will druid handle that query? will druid fetch the relevant data required for the query from deep storage and delete it after serving the query? or will druid just say that no data is available for that range? How do you handle such scenarios? we want the data older then 1 month to still remain queryable through druid.
Looking forward to hearing your experiences.