Batch ingestion overwrites segments for the same day?


I’m ingesting CSV files using a single middle manager with Peons.

If I submit two index tasks for the same datasource and the same interval e.g. two files with data for 1997-01-01. Does the second task ‘overwrite’ the first? I can see many instances in my datasource where I’m sure data has been ingested, but a select query for that day only shows some of the data, all from the second file?

Is there another way to do this? Perhaps I should be using the append task?



Hi Richard,
Each segment has a version and when coordinator sees multiple segments for same datasource and interval, It uses MVCC and loads only the latest version on the historicals,

Index Task work by taking locks on datasource and interval to ensure that multiple tasks run in a sequencial order, In your case the second task replaces the segments generated by first task as the segments generated by second task will have latest version.

In general if you want to modify and reIndex the data for an interval, you will need to reIndex whole data for that interval instead of just indexing the updates.

Thanks very much.