I wanted to know how to populate old segments from deep storage if query data is not available on the local cache. For example, we have 2 days data segments on local disk cached but what if we want to query data from 3 or 4 days ago?
In that case you want to check your retention rules for the datasource, using that you can load the data from HDFS to your historicals. You may increase your retention rules to get the interval of data you are looking for.
But in my case data is not available because of space issues. Total data for 2 days comes to around 48GB and disk space is 50GB. This means it will store 2 days of data only on disk but in some cases, we need to query data from before 2 days.
Data must be loaded onto Historicals in order for it to be queryable - simply because it’s the Historicals that actually do the querying.
The data loaded by Historicals is controlled by the coordinator - you set the load / drop rules accordingly.
If it’s data you will not query often, you might want to spin up a second tier - you can then just bring this up when you want to query data that’s older than you usually look at, and then take it back offline afterwards. You could also use muuuuuch cheaper hardware