I have an independent external zk setup.
I installed 2 masters, 3 historical, 3 middle managers, 2 brokers, 2 routers.
with bin/run-druid $role middle-conf/druid/cluster/$service-category
after all the services are up for a while, some of the nodes automatically kicked off shutdown hook and the services went offline.
no exceptions, no errors found in log.
I’m duplicating my reply to the other conversation:
It might be a ZK quorum issue. If you think that might be the case, you can enable ZK logging (the article was written by Imply, my employer).
thanks a lot, will keep you posted.
mark: why would zk quorum issue cause some service auto shutdown?
how to avoid this? ie: after we init master1, we init master 2 after half hour? init master 3 after another half hour? to make sure the leader in each service is defined?
it may not be a quorum issue. I set long time intervals between each startup of node for a given component, but I still saw a node graceful shutdown after 20minutes.
Make sure you turn on all of your logging level to at lease INFO. We usually get a stacktrace when there is an issue talking to ZK.