We have a datasource with records as far back as August of 2015. However, data appears to be slowly being lost. We now can’t query anything out before September of 2015, and I don’t see segments for anything early than that in deep storage. We don’t have any rules configured for this datasource, is there some default limit on how long data is kept?
By default segments should be kept forever. Can you issue a timeBoundary query to see where the intervals of your data lie?
In particular, Druid does not delete data from deep storage unless you specifically ask it to by submitting a kill task or by configuring auto-killing (which is off by default). So I wonder if they got misplaced some other way.
The timeBoundary query shows that we have not lost any more data since I sent the first email. I’m still worried that at some point we’ll start losing more, but it is possible that we lost this data due to something else. I’m not sure what it would have been though, the only tasks we are running are merge tasks and the data has been in the same place the whole time.