Druid Fault Tolerance

Hello Guys,

Need your help to achieve Fault Tolerance.

I am very new to druid. I tried single node druid setup and it’s working fine. Now I wanted to setup Druid cluster with Fault Tolerance on AWS with 3 machines(c3.4xlarge).

To achieve Fault Tolerance I am considering following architecture. Please help me to correct it.

Zookeeper servers - 3 (zookeeper cluster)

coordinator servers - (Not Sure how to achieve Fault Tolerance)

Broker servers - 3 (All will be behind load balancer)

Historical servers - 3

overlord servers - 3

middle Manager servers - (Not Sure how to achieve Fault Tolerance)

Pivot servers - 3 (All will be behind elb)

metadata server - mysql with master-slave replication topology

Deep Storage - AWS S3


As per the doc (http://druid.io/docs/latest/design/design.html - section “Fault Tolerance” )

Server A - Coordinator, Overlord and zookeeper, metastore (Master Server)

Server B - Historicals and MiddleManagers + tranquility (Data Server)

Server C - Druid Brokers, Pivot, PlyQL (Query Server)

Deep Storage - AWS S3

But this setup does not gives me Fault Tolerance.

Please help to understand how I can achieve this.