We are trying to install Druid in cluster mode with 3 machines - one each for master, query and data with a common 3-node zookeeper cluster.
Master processes and data processes have started without any issue, but query services are failing to start. Broker process has come up and no error is found in the log. Router process has the following error trace.
2022-12-06T12:09:23,908 WARN [CoordinatorRuleManager-Exec--0] org.apache.druid.discovery.DruidLeaderClient - Request[http://localhost:8081/druid/coordinator/v1/rules] failed.
org.jboss.netty.channel.ChannelException: Faulty channel in resource pool
at org.apache.druid.java.util.http.client.NettyHttpClient.go(NettyHttpClient.java:134) ~[druid-core-24.0.1.jar:24.0.1]
at org.apache.druid.java.util.http.client.AbstractHttpClient.go(AbstractHttpClient.java:33) ~[druid-core-24.0.1.jar:24.0.1]
at org.apache.druid.discovery.DruidLeaderClient.go(DruidLeaderClient.java:143) ~[druid-server-24.0.1.jar:24.0.1]
at org.apache.druid.discovery.DruidLeaderClient.go(DruidLeaderClient.java:127) ~[druid-server-24.0.1.jar:24.0.1]
at org.apache.druid.server.router.CoordinatorRuleManager.poll(CoordinatorRuleManager.java:137) ~[druid-services-24.0.1.jar:24.0.1]
at org.apache.druid.java.util.common.concurrent.ScheduledExecutors$1.call(ScheduledExecutors.java:55) ~[druid-core-24.0.1.jar:24.0.1]
at org.apache.druid.java.util.common.concurrent.ScheduledExecutors$1.call(ScheduledExecutors.java:51) ~[druid-core-24.0.1.jar:24.0.1]
at org.apache.druid.java.util.common.concurrent.ScheduledExecutors$2.run(ScheduledExecutors.java:97) ~[druid-core-24.0.1.jar:24.0.1]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_322]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_322]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[?:1.8.0_322]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[?:1.8.0_322]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_322]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_322]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_322]
Caused by: java.net.ConnectException: Connection refused: localhost/127.0.0.1:8081
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_322]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716) ~[?:1.8.0_322]
at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152) ~[netty-3.10.6.Final.jar:?]
at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) ~[netty-3.10.6.Final.jar:?]
at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) ~[netty-3.10.6.Final.jar:?]
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) ~[netty-3.10.6.Final.jar:?]
at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) ~[netty-3.10.6.Final.jar:?]
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) ~[netty-3.10.6.Final.jar:?]
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) ~[netty-3.10.6.Final.jar:?]
... 3 more
2022-12-06T12:09:23,912 WARN [HttpClient-Netty-Boss-0] org.jboss.netty.channel.SimpleChannelUpstreamHandler - EXCEPTION, please implement org.jboss.netty.handler.codec.http.HttpContentDecompressor.exceptionCaught() for proper handling.
java.net.ConnectException: Connection refused: localhost/127.0.0.1:8081
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_322]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716) ~[?:1.8.0_322]
at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152) ~[netty-3.10.6.Final.jar:?]
at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) ~[netty-3.10.6.Final.jar:?]
at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) ~[netty-3.10.6.Final.jar:?]
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) ~[netty-3.10.6.Final.jar:?]
at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) ~[netty-3.10.6.Final.jar:?]
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) ~[netty-3.10.6.Final.jar:?]
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) ~[netty-3.10.6.Final.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_322]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_322]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_322]
From the logs, we gather that it is trying to connect to coordinator running locally (127.0.0.1), but coordinator is running on master. The same request is giving response if the localhost is substituted with master’s IP address.
curl -XGET http://<master-ip>:8081/druid/coordinator/v1/rules
{"_default":[{"tieredReplicants":{"_default_tier":2},"type":"loadForever"}]}
There is no trace of localhost anywhere in the query conference directory <>/conf/druid/cluster/query/.
Any pointers on what is being missed here is appreciated.
This is the conf entries in the file
druid.service=druid/router
druid.plaintextPort=8888
druid.router.http.numConnections=50
druid.router.http.readTimeout=PT5M
druid.router.http.numMaxThreads=100
druid.server.http.numThreads=100
druid.router.defaultBrokerServiceName=druid/broker
druid.router.coordinatorServiceName=druid/coordinator
druid.router.managementProxy.enabled=true
Trouble shooting steps that we have tried.
- Curl-ing the request to master manually. This gives a success response.
- adding
druid.host=<master-ip>
in the query configuration file.