Druid Fault Tolerance

Hi there,
I’m trying to understand how Druid provide fault tolerance. Based on my understanding

  1. Historical node: (If a historical node goes down, druid-coordinator will try to place the segments to another historical node. Historical node in the case are slave in typical master-slave architecture)

  2. Broker. (We can have multiple broker nodes behind load balancers. Broker in this case a typical client)

  3. Druid-MiddleManager. (Druid middle manager can have replication. So if replication factor is 2, Overlord will create 2 peons in MiddleManager. This is typical master-master fan out write. If one peon goes down, Druid will run with single peon for that segment granularity period.

  4. Tranquility: A typical kafka consumer in my case. So all properties to kafka consumer group apply here.

things I’m not clear.

  1. Druid-coordinator: Is it active-passive fault tolerance? In case of active coordinator failed how the passive node gat control?

  2. Druid overlord?

Thank you.