I have 2 data sources in Druid. I recreated the druid cluster, and post that I am seeing the availabilibity of data sources is <= 99% available. When I hover over the data source in the coordinator UI console, I see the information saying 80% to load until available.
What does this mean? I see the same status for last few hours (data is not that huge). How do I ensure that the data source is fully available?
What are the retention rules ? What is the time interval of the data you have ingested ?
The data sources do not have any renetion rules. I am not ingesting any new data. Data was already there, I just recreated the druid cluster.
what do you mean by “recreating druid cluster” ?
There will be some default retention rules for the datasources if you have not explicitly set them.I faced a similar issue where I had ingested data for jan to march but my default retention rule was 30 days so the data was not loaded.Aalso Do you see any message in the form of “x number of segments to load” on coordinator UI ?
It has the default rule i.e. ‘Load Forever’. So it should load everything.
“Recreating druid cluster” -> Sorry if I created confusion here. I have deployed druid cluster on AWS (each process as an individual node). I deleted that cluster, and recreated it again pointing to the same deep and metadata storage (Deep storage, and metadata storage were not modified in any way). Once the cluster was deployed, I suppose druid starts loading segments from deep storage, and the data source availability is updated as per that (which can be seen on coodinator console). Here I am seeing that one of the datasource has shows availability <= 99% (even after 8 hours post cluster deployment) although that datasource has just few MB’s of data in deep storage. So, how do I ensure that the data source is fully available?
Looking at the Historical logs might give some hints.
Thank you. Will take a look.