I have setup druid cluster (0.21 version) with 2 coordinators , 3 Zk nodes ,2 (Query servers), 6 (HH & MM) - All services are on individual nodes .
When i start the cluster with single cordinator , all the servcies comeup good and there no erros on the admin console . When i bring the other coordinator up and stop the prveious cooridnator , I see everything failing and i see the below error messages on the console , on middle managers and my services do not startup . I am suspecting that the ZK is not doing proper leader election or something else might be wrong . also when i bring up the other coordinator up as well , still things woudl not start normall and when i check the api to find out the leader , it shows both the coordinators as leaders . Please help , breaking my head overthis for 3 weeks now .
ERROR [BasicAuthenticatorCacheManager-Exec--0] org.apache.druid.security.basic.authentication.db.cache.CoordinatorPollingBasicAuthenticatorCacheManager - Encountered exception while fetching user map for authenticator [ldap]: {class=org.apache.druid.security.basic.authentication.db.cache.CoordinatorPollingBasicAuthenticatorCacheManager, exceptionType=class org.apache.druid.java.util.common.IOE, exceptionMessage=No known server}
org.apache.druid.java.util.common.IOE: No known server
at org.apache.druid.discovery.DruidLeaderClient.getCurrentKnownLeader(DruidLeaderClient.java:267) ~[druid-server-0.21.0.jar:0.21.0]
at org.apache.druid.discovery.DruidLeaderClient.makeRequest(DruidLeaderClient.java:122) ~[druid-server-0.21.0.jar:0.21.0]
at org.apache.druid.security.basic.authentication.db.cache.CoordinatorPollingBasicAuthenticatorCacheManager.tryFetchUserMapFromCoordinator(CoordinatorPollingBasicAuthenticatorCacheManager.java:252) ~[?:?]
at org.apache.druid.security.basic.authentication.db.cache.CoordinatorPollingBasicAuthenticatorCacheManager.lambda$fetchUserMapFromCoordinator$1(CoordinatorPollingBasicAuthenticatorCacheManager.java:192) ~[?:?]
at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:87) ~[druid-core-0.21.0.jar:0.21.0]
at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:115) ~[druid-core-0.21.0.jar:0.21.0]
at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:105) ~[druid-core-0.21.0.jar:0.21.0]
at org.apache.druid.security.basic.authentication.db.cache.CoordinatorPollingBasicAuthenticatorCacheManager.fetchUserMapFromCoordinator(CoordinatorPollingBasicAuthenticatorCacheManager.java:190) ~[?:?]
at org.apache.druid.security.basic.authentication.db.cache.CoordinatorPollingBasicAuthenticatorCacheManager.lambda$start$0(CoordinatorPollingBasicAuthenticatorCacheManager.java:122) ~[?:?]
at org.apache.druid.java.util.common.concurrent.ScheduledExecutors$1.call(ScheduledExecutors.java:55) [druid-core-0.21.0.jar:0.21.0]
at org.apache.druid.java.util.common.concurrent.ScheduledExecutors$1.call(ScheduledExecutors.java:51) [druid-core-0.21.0.jar:0.21.0]
at org.apache.druid.java.util.common.concurrent.ScheduledExecutors$2.run(ScheduledExecutors.java:97) [druid-core-0.21.0.jar:0.21.0]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_262]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_262]