I am trying to set up a druid cluster with
- 3 zookeepers
- 2 overlords and coordinators
- 2 data servers
- 2 query servers
The issue I am having is after setting up 3 zookeepers in 3 different hosts, when I try to start the query server and when it tries to talk to one of the ranked zookeepers, I get an error saying “druid/coordinator service cannot be found”.
Does this mean, zookeeper instances should have coordinators running on them as well? If i choose to have 3 zookeepers should I have 3 master nodes running as well?
The reason behind the question is, if my zookeeper instances have “coordinator” process running, it seems to work. If only zookeeper is running and a query process is trying to make contact, it fails
Thanks a ton for your response.
Most likely your zk setup has an issue. Can you make sure your coordinators are present in all 3 zk instances?
You do not need to run coordinator on the same nodes as the zookeeper’s. Three things. 1) you need to make sure you configured the three Zookeeper’s to run as a three node ensemble. 2) you need consistent zookeeker path names and settings for all druid nodes. 3) coordinator service should be started before you start query services.
Thanks Marc, yes once the coordinators are running, it seems to work fine.
So let me know if my below understanding is right
a) if you plan to have multiple zookeepers, but not a separate zookeeper cluster, then you will need to run them with coordinators.
b) Other option is to go for a completely separate zookeeper cluster.
Thanks for your reply Chris.
I am not running a separate zookeeper cluster, but want to run them in multiple nodes to work as a fail-safe.
So I would have to run them with the master nodes( with coordinator process started) correct?
Perhaps you can share your configuration? If it works when zookeeper and coordinator/overlord are co-located, there may be either an incorrect config and/or the nodes cannot see each other.
Oh, wait…when you say you don’t want to run a separate zookeeper cluster, but rather install on three nodes for a fail-safe, do you mean three completely separate zookeeper installations? If so, that’s not how zookeeper works. By installing three (must always be odd number) as a cluster (does not matter if co-located with coordinators), you get the redundancy you are looking for.
Got you, thanks a ton Aaron.
one last question. When i run 3 zookeeper installations, but yet have all 3 with ports in the common.runtime of master, data and query as below
would it not be a valid “cluster” configuration?