I have a data source which loads the data of past two months. The rule is accompanied by another drop rule of the period before 2 months. So, only segments of the past two months are loaded in the cluster.
But, I want to create another data source to re-index this data source and process the segments past the two months.
To achieve this, it seems, I need to flag the segments as used in my original data source, which would cause them to be loaded onto the cluster (and I do not want this). I want any segments before the past two months, to be reprocessed in the new re-indexer data-source. Is that possible?
Relates to Apache Druid 0.20.0
What are you trying to achieve?
Is this about different time granularity in prior months?
It is possible to create compaction tasks such that prior time uses coarser queryGranularity, i.e. from hour to day.
There’s more info about this here: Compaction · Apache Druid
I want to have two different data sources with different time granularity. I know I can compact older segments with different criteria such as a new roll-up (coarser), but I do not want to touch original data source. Just trying to create a new data-source
You are correct in your original post. You will need to mark them as used for those segments to be reindexed.
I was going to ramble on about tiering but then I read your question again and now I think I understand properly (!) that you want to do a one-time re-indexing of some old data that is currently not loaded on the cluster?
+1 to @Vijeth_Sagar in that case.
You could go right into the weeds and use APIs to find out your unused segments and then build some kind of cloning process behind-the-scenes but even the thought of that scares me.