Batch Ingestion, overwriting segments


i was looking over the IngestTask code and i see, that finished segments are pushed and at the end announced.

I was wondering, what about existing segments? Who takes care of them? Do i have to unnannounce them and delete them, before i create new segments?

For example, the realtime node creates for an hour 2 segments, one for each shard. On re-index i know, that i can stuff everything into 1 shard. So, when i upload/announce segments, how will the old shards be treated?

The coordinator automatically takes care of obsoleting overshadowed segments and removing them from the cluster. Segments are never deleted from deep storage unless explicitly told to do so. You can create new segments from batch ingestion and replace entire time slices in Druid. Druid will take care of querying the most recent data and removing old data.

Druid uses MVCC and each segment has a version (the version is the timestamp of when the segment was created). So if you create new segments for a time interval with a newer version, all queries for that time interval will go to the segment with the most recent version.

The version! Of course. Thank you! :slight_smile: