Few questions about druid on PROD

Hi everyone!
Currently i am investigation an opportunity how to use druid on PROD.

I have created test cluster in 4 nodes:

  1. Coordinator, Metadata storage, Overlord, Zookeeper.
  2. Brokers.
  3. Historical, Middle Managers.
  4. Second Historical, Middle Managers.

I am also planning divide all services regarding to http://druid.io/docs/latest/configuration/production-cluster.html

I have read almost all druid’s documentation, but still have couple of questions:

  1. How can i backup and restore my cluster?
  2. What would be if on node(for each services) would be unavailable?
  3. Is it possible to use any load-balancers?
  4. Does druid have any replication mechanism, like in Elastic(duplicate node)?

Best regards.

See Inline.

Hi everyone!
Currently i am investigation an opportunity how to use druid on PROD.

I have created test cluster in 4 nodes:

  1. Coordinator, Metadata storage, Overlord, Zookeeper.
  2. Brokers.
  3. Historical, Middle Managers.
  4. Second Historical, Middle Managers.

I am also planning divide all services regarding to http://druid.io/docs/latest/configuration/production-cluster.html

I have read almost all druid’s documentation, but still have couple of questions:

  1. How can i backup and restore my cluster?

Deep Storage act as permanent backup of your data. Segments are stored in deep storage and metadata entries about the segment are stored in metadata storage. You need not explicitly backup your cluster as far as the metadata store and deep storage are backed up and durable. (http://druid.io/docs/latest/dependencies/deep-storage.html)

  1. What would be if on node(for each services) would be unavailable?

Please note that above setup you described in not HA enabled. To enable HA for druid, you need to make sure you have more than one broker/coordinator/overlord. Ingestion tasks are redundant and data is loaded on more than one historical nodes.

Also note that you will need to configure ZK also for HA.

  1. Is it possible to use any load-balancers?

You can have multiple broker nodes and use a load balancer to distribute queries among them.

  1. Does druid have any replication mechanism, like in Elastic(duplicate node)?

For replication of data across multiple nodes, you can configure rules on your coordinator node. See http://druid.io/docs/latest/operations/rule-configuration.html