Any recommendation on load balancing the Druid components


I am reading official docs, it’s recommended to use LB for the broker service. What about all others? Does anyone have production deployment which can share some experience?



One of the best thing about druid is its architecture. It gives you a good control over individual services.

Let’s take it service by service.


Run multiple brokers behind a LB


You can simply run multiple nodes for these services, they’ll themselves elect a leader and work fine. You can set up a LB in front of your coordinator/overlord nodes. Any request that goes to service that’s not master is redirected to the master one anyway.

Metastore :

I would suggest to add your metastore behind an LB as well. Simple reason, whenever you have to migrate your backend metastore you won’t have to update the common properties file across all nodes and restart them. Making changes in LB and migrating db endpoints saves a lot of effort.

Historical nodes :

Data is distributed across all the historical nodes and you would not need to query data directly from historicals. Your query should go via broker. So for HA, you don’t really need to do a lot for historical. In case data availability is really important, consider having multiple replicas of the same datasource loaded in historicals.

Middlemanagers :

Run multiple middle managers, task distribution is controlled by overlord node. It will send the task to the middlemanager based on your worker selection strategy.