Load data into historical nodes is very slow

We are running a 0.10.1-rc3-SNAPSHOT build.

When I started a new historical node that needs to load 3406 segments (around 478GB) it takes around 7 hours. This is from Google Cloud Storage which has more then enough bandwidth to do this much much faster.

The problem seems to be that the coordinator node doesn’t tell the historical node quickly enough which segments to load. The historical node is constantly idling waiting for new segments.

Is there any way to speed this up?

Why do we need this? We are trying to use Google Cloud Preemptible instances for historical nodes. These instances automatically get deleted every 24 hours. But they are much cheaper. You can put them in an Instance Group that automatically recreates them each time they are killed. So if Druid can load it’s segments quick enough this is very doable.

Did the loading happen faster with 0.10.0?

Are the segments being loaded new replicas of already-existing segments, or segments that are getting moved from somewhere else? If so those two are throttled, and you might need to raise the throttles to get it to go faster. The idea behind the throttles is that you wouldn’t want to waste all your bandwidth on loading up and moving around segments that are already available somewhere else. So you’d want to do it gradually.

I didn’t test it with 0.10.0 as we already upgraded our cluster and I don’t want to downgrade again.

We only use the default rule which is set to load forever all segments with a replication factor equal to the amount of servers we have. I don’t see any segments being moved.
The only thing I see that might affect it is an issue I already reported: https://github.com/druid-io/druid/issues/4609