Coordinator Segment Handoff - Is it possible to prioritize new segments from ingestion tasks?

Apache Druid Version: 0.23.0

Sometimes when we have infrastructure issues, we have to decommission some Historicals and temporarily lower retention period (to not run out of disk on running ones). When we bring back the Historicals and re-apply our old retention period and reload the dropped data, our ingestion tasks wait for handoff much longer and sometimes fail because they did not handoff completely in the completionTimeout phase even if that is set to a higher number.

Is it possible to adjust Coordinator behavior to prioritize new segment handoffs from currently running ingestion tasks over reloading dropped data so that running ingestion tasks are not affected and still have some segments being reloaded?

Maybe a compaction job using dropExisting?

I have not tested any of this, but here are some thoughts:

In this section of the docs, there’s a good review of dynamic configuration parameters that will throttle the rebalancing process. This might help to address the problem:

Also, in the Configuration reference · Apache Druid, two parameters seemed relevant to me:

  • druid.coordinator.period which controls how often coordination processes run (default is 1 min)
  • druid.coordinator.loadqueuepeon.repeatDelay which controls how often the load queue is managed (default is 50ms)

My theory here would be that you want to leave the load queue peon responsive in order to do the handoff, but perhaps slow down the overall coordination process of rebalancing so it does not run so often in combination with throttling it (above), you might be able to achieve “prioritization”. But it does sound like a good idea, I was about to suggest that you add it to github, but I see you already did! Thank you!