Multiple broker nodes

Good morning everybody.

I was unable to find any information about using the multiple broker nodes. If I have, say 2 broker nodes, how can I configure clients to use all of them? E.g. in round-robin way? Should each client decide by himself which one to use? Or is there some standard mechanism in druid itself?

Best regards


I have setup an external load balancer in round-robin fashion. I think this is not addressed in Druid docs. It should be, because external services are not part of the cluster and don’t use ZooKeeper to get the other broker metadata - for client side load balancing for example.

It would be nice to have a Druid Broker Balancer node, which would load balance between broker nodes. So you don’t have to depend on external load balancer to accomplish HA.

But I have noticed this doesn’t give you full HA - because when some historical nodes (or some other query nodes go down), regardless you have multiple replicas or not, for some time you receive an Error 500 from the broker saying something like “could not gather results”. This is so silly, because the broker should be aware the state of cluster, and return results from the replica.

Any chance the doc can help you ?

Well, that doc talks “You should only ever need the router node if you have a Druid cluster well into the terabyte range” and broker tiers. I have around 100G of data, but my performance tests show that it is broker who is bottleneck, not the historical nodes. So I wanted just simple way to replicate brokers.

Do you have populate cache enabled on the broker?

Yes, I have

Try turning populateCache OFF at the broker level

We have setup ELB for our 2 broker nodes and it works pretty well.

Hi Nikita,

Are you by any chance issuing large groupBys and seeing the bottleneck be the merge time on the broker?

If not, I suggest you disable cache on broker and enable cache on the historicals themselves. This will cause historicals to locally merge results and should reduce the merge bottleneck on the broker. If you are using groupBys, I suggest waiting for 0.9.2, which has a completely rewritten groupBy engine that is significantly faster.