Druid batch indexing not working with Yarn ha

I have been using Druid 0.8.1 on HDP 2.3 (with Namenode HA and a single Yarn resource manager)

But the same config is failing in a HDP cluster that is configured with Namenode HA and Yarn HA.

This is the message I see in the Indexer log:

2016-04-09T08:21:53,495 INFO [task-runner-0] org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider - Failing over to rm1
2016-04-09T08:22:28,917 INFO [task-runner-0] org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider - Failing over to rm2
2016-04-09T08:22:44,568 INFO [task-runner-0] org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider - Failing over to rm1
2016-04-09T08:23:19,884 INFO [task-runner-0] org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider - Failing over to rm2
2016-04-09T08:23:40,845 INFO [task-runner-0] org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider - Failing over to rm1
2016-04-09T08:23:40,848 WARN [task-runner-0] org.apache.hadoop.io.retry.RetryInvocationHandler - Exception while invoking class org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getNewApplication. Not retrying because failovers (30) exceeded maximum allowed (30)
java.net.ConnectException: Call From druidnode1001.local/10.193.64.154 to 0.0.0.0:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.GeneratedConstructorAccessor75.newInstance(Unknown Source) ~[?:?]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.7.0_79]
at java.lang.reflect.Constructor.newInstance(Constructor.java:526) ~[?:1.7.0_79]
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) ~[druid_build-assembly-0.1.jar:0.1-SNAPSHOT]
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) ~[druid_build-assembly-0.1.jar:0.1-SNAPSHOT]
at org.apache.hadoop.ipc.Client.call(Client.java:1410) ~[druid_build-assembly-0.1.jar:0.1-SNAPSHOT]
at org.apache.hadoop.ipc.Client.call(Client.java:1359) ~[druid_build-assembly-0.1.jar:0.1-SNAPSHOT]
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) ~[druid_build-assembly-0.1.jar:0.1-SNAPSHOT]
at com.sun.proxy.$Proxy194.getNewApplication(Unknown Source) ~[?:?]
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getNewApplication(ApplicationClientProtocolPBClientImpl.java:167) ~[druid_build-assembly-0.1.jar:0.1-SNAPSHOT]

And this is the bin/run-druid:

exec java cat "$CONFDIR"/"$WHATAMI"/jvm.config | xargs
-Dhadoop.dfs.nameservices=nnha
-Dhadoop.dfs.ha.namenodes.nnha=nn1,nn2
-Dhadoop.dfs.namenode.rpc-address.nnha.nn1=nn1001.local:8020
-Dhadoop.dfs.namenode.rpc-address.nnha.nn2=nn1002.local:8020
-Dhadoop.dfs.client.failover.proxy.provider.nnha=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
-Dyarn.resourcemanager.ha.enabled=true \ # tried with -Dhadoop.yarn.resourcemanager…
-Dyarn.resourcemanager.ha.rm-ids=rm1,rm2
-Dyarn.resourcemanager.hostname.rm1=nn1001.local
-Dyarn.resourcemanager.hostname.rm2=nn1002.local
-Dhdp.version=2.4.0.0-169
-cp “$CONFDIR”/_common:"$CONFDIR"/"$WHATAMI":ls "$WHEREAMI"/../dist/mz/*.jar | xargs | tr ' ' ':
cat "$CONFDIR"/$WHATAMI/main.config | xargs

Based on the message “0.0.0.0:8032 failed on connection exception” it seems the Indexer service is unable to resolve the logical name (rm1 & rm2).

All hadoop configs are consistent across all nodes, and I verified Yarn is working fine by running a test MR job from the command line.

Any suggestion, how this issue can be resolved ?

Thank you,

Bikrant

We have no experience in running Druid with Yarn. If you manage to get things running though, please share the details with us so we can update the docs.

Did you find a solution? Or did you get it working at least? Please let me know.

In my opinion this is definitely an issue with missing Hadoop config file in your setup.
I have previously successfully used yarn for running the batch ingestions.

You can login to one of the yarn machines and list all the hadoop config files (generally found at /etc/hadoop/*) and verify that these files are present in the Hadoop classpath on the machine from where you are running the batch ingestion.