Idempotent delta ingestion

According to the documentation

It is STRONGLY RECOMMENDED to provide list of segments in dataSource inputSpec explicitly so that your delta ingestion task is idempotent.
Could you please elaborate a bit on this? What exactly happens if I don’t? I tried it without the segments, and it looks idempotent to me. Was I just lucky?
Or maybe the data I was using had some special properties that make it idempotent anyway, but it won’t work with other data / other ingestion task?

Máté Szabó

The docs are trying to warn you that if you do a delta ingestion using a “multi” + “dataSource” spec, it’s possible that the task could publish segments but then fail for some reason after that (maybe it fails after publishing because it’s not able to do one of its shutdown tasks, like archiving its logs). Then you might retry it, and end up loading the new data twice. It’s not very likely, but it could happen. Providing a specific list of segments prevents that scenario from happening.