Uneven distribution of data on historical nodes


When loading data from S3 after hadoop indexing we often see an uneven distribution of data on our historical nodes. Recently one node was more than 90% full while others were ~60% full. Is this a problem? Does it suggest a configuration problem?

Thanks for your help,
Morri Feldman


Forgot to mention that we are running imply-2.0.0 / Druid 0.9.2

It could just mean that your cluster is balancing slowly. There’s a throttle called “maxSegmentsToMove” that you can edit if you click the pencil in the coordinator console. The default value is pretty low.

Thanks @Gian

Unfortunately, we are already running with maxSegmentsToMove set to 300.


I have the same problem with you. Have you solved this problem?

Could you tell me how to solve this problem?

Thank you.

在 2017年5月19日星期五 UTC+8上午1:37:22,mo…@appsflyer.com写道:

Hi Linjing,

We never really solved it. I just checked and two of our clusters have an even distribution of data on the historical nodes, but on one cluster the nodes vary by ~20% in how much data they are holding.