I have setup an external load balancer in round-robin fashion. I think this is not addressed in Druid docs. It should be, because external services are not part of the cluster and don’t use ZooKeeper to get the other broker metadata - for client side load balancing for example.
It would be nice to have a Druid Broker Balancer node, which would load balance between broker nodes. So you don’t have to depend on external load balancer to accomplish HA.
But I have noticed this doesn’t give you full HA - because when some historical nodes (or some other query nodes go down), regardless you have multiple replicas or not, for some time you receive an Error 500 from the broker saying something like “could not gather results”. This is so silly, because the broker should be aware the state of cluster, and return results from the replica.