Why should druid run in safe mode for insert-segment-to-db Tool?

as per the docs :

“This tool expects users to have Druid cluster running in a “safe” mode, where there are no active tasks to interfere
the segments being inserted. Users can optionally bring down the cluster to make 100% sure nothing is interfering.”

Was wondering why ?

how is it different from regular batch job indexing, where the segment is written to deep store by mapreduce where as here its ready here ?

Our usecase scenario.

We have a spark streaming job which is ingesting realtime into druid in datacenter A , and we need to ferry the segment to datacenter B (clients access data from druid present in datacenter b).

the spark streaming job has to run in datacenter A (only outgoing traffic allowed from datacenter A).
The data access has to happen from datacenter B, hence we have 2 druid clusters.
datacenter B has batch indexing running in the mornings.

We wanted to use the insert segment tool to simply insert segment into datacenter B druid? Since its a real time indexing in datacenter A, we will moving segments regularly (every 5 minutes).

Will this segment insertion clash with batch indexing ? is it safe ?

I can only answer for myself, but I’ve used that tool to move segments from an old druid to a new druid while both were running. So after copying the segments to the deep storage of the new druid, I used the tool to re-scan the entire segments location for a single data source. The cluster was running and some data was shadowed (it was planned, as they were old experiments).

I found two issues with shadowing old segments: moving from hourly to daily segments, two random hours for two days weren’t shadowed. The solution here was to set those hourly segments to used=0 in the metadata database and the problem was solved.

I hope this helps.