Understanding Data Load balancing in Historicals

Hi,
I have some confusion about how load balancing occur in druid historicals, For example we have 650 GB of segment data in druid. Historical configuration ( r4 x2 large) 3 historicals. in each historical iam setting druid.max.size as 300 GB. i:e total 3 historicals contains 900 Gb space, what i understand is this 650 GB segment data has to fit in 900 Gb. , but each historical machine is trying to load all the segment data. is my understanding is wrong how historical is loading the segment data …?, how historicl works

This would be interesting to know about. I am aware that each historical holds ‘n’ number of segments and announces each segment ownership to zookeeper. But on what criteria would historicals share these segments?!

Some elaboration would help us understanding this better!

-Thanks for the help.

in each historical iam setting druid.max.size as 300 GB. i:e total 3 historicals contains 900 Gb space, what i understand is this 650 GB segment data has to fit in 900 Gb. , but each historical machine is trying to load all the segment data.

You would need to check the load rules (http://druid.io/docs/latest/operations/rule-configuration.html) for that datasource and the replication factor. For example, if you configured 3 replicants for the entire time range of the data, with 3 historicals, all 3 historicals would need 900GB disk space.

like above, you can check the coordinator configuration, the default value is cache, this is the loading strategy.

在 2018年7月31日星期二 UTC+8下午1:19:38,Kiran Sunkari写道: