as per the docs :
“This tool expects users to have Druid cluster running in a “safe” mode, where there are no active tasks to interfere
the segments being inserted. Users can optionally bring down the cluster to make 100% sure nothing is interfering.”
Was wondering why ?
how is it different from regular batch job indexing, where the segment is written to deep store by mapreduce where as here its ready here ?
Our usecase scenario.
We have a spark streaming job which is ingesting realtime into druid in datacenter A , and we need to ferry the segment to datacenter B (clients access data from druid present in datacenter b).
the spark streaming job has to run in datacenter A (only outgoing traffic allowed from datacenter A).
The data access has to happen from datacenter B, hence we have 2 druid clusters.
datacenter B has batch indexing running in the mornings.
We wanted to use the insert segment tool to simply insert segment into datacenter B druid? Since its a real time indexing in datacenter A, we will moving segments regularly (every 5 minutes).
Will this segment insertion clash with batch indexing ? is it safe ?