My team is rebuilding a server at a time in our Druid cluster. Each time we remove and then re-add a historical node, we have to replicate ~ 4TB of data. All of this data seems to go directly to the new node (I assume the re-balancing algorithm is at play here), which makes sense. However, this causes a major backup of segments to be loaded and becomes a bottleneck because we can’t start the next rebuild until everything is fully replicated. Is there a better way to go about this to speed up our process?
One idea we had is to let the replication of a servers data be distributed across the rest of our nodes and not add the replaced server back until replication is complete. Adding the server back would start balancing, and we could then remove the next server in our list at that time, causing replication to occur across the rest of our nodes again. However, I have a feeling that most of the data will be replicated to that first server we replaced because it has much less usage than the rest of our servers, once again introducing our bottleneck of throughput from hdfs to historical.
If anyone has experience doing a similar kind of task with their cluster (rebuilding a server at a time in their cluster), I’d love to hear your approach to avoid the bottleneck that I am currently facing.