Running CLI hadoop-indexer on HDP 2.3.4 with HA enabled

I’ve been having various difficulties configuring the CLI hadoop-indexer to run on our HDP 2.3.4 with HA cluster.

First, I would ask, do you expect this will work? I’ve seen some indications via searches that it may not.

Second, if it’s known to work, excellent, please share any details of importance.

FWIW, my issues? (quick summary)

The provided 2.3.0 jars are not sufficient to work with HA yarn. - constant fail over retries

Use of my HDP libraries causes various library mis-match issues depending upon how configured (classpath, hadoop-coordinates, build druid jars from source)

Hi Paul,
http://druid.io/docs/latest/operations/other-hadoop.html has the instructions for making druid work with custom versions of Hadoop.

Could you share more concrete details on the problem you are facing ? task logs and the stack trace for exception you are getting ?

FWIW, setting up proper classpath and hadoopCoordinates should work fine for getting druid work with HDP 2.3.4.

Cheers,

Nishant

Thank you for the prompt response… :slight_smile:

I have used the “other-hadoop” page to formulate quite a few experiments with hopes of resolution.
I will start with the simplest and provide the tracebacks I’m seeing.

First off, I’m using druid-0.9.1.1.

In the druid-0.9.1.1 folder on an edge node in my cluster, I run:

java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Dhdp.version=2.3.4.0-3485 -classpath lib/*:/usr/hdp/2.3.4.0-3485/hadoop/conf io.druid.cli.Main index hadoop quickstart/ingest-indProd.json

2016-08-23T16:36:08,055 WARN [main] org.apache.hadoop.hdfs.BlockReaderLocal - The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.

2016-08-23T16:36:08,065 WARN [main] org.apache.hadoop.hdfs.BlockReaderLocal - The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.

2016-08-23T16:36:08,101 INFO [main] org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider - Failing over to rm2

2016-08-23T16:36:44,244 INFO [main] org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider - Failing over to rm1

2016-08-23T16:37:17,662 INFO [main] org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider - Failing over to rm2

2016-08-23T16:37:59,704 INFO [main] org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider - Failing over to rm1

Hangs here without launching any jobs. I’m assuming this is related to the HA configuration as specified in the config dir included in the classpath.

This, of course, includes HA yarn configuration, but I fear the included 2.3.0 hadoop dependencies do not support HA yarn.

BTW, I have also tried including my hadoop classpath on the classpath (also via hadoop dependencies)

In some cases, I can get the jobs to launch and have run into the jackson version issue described under the CDH section in the MR job application logs on the cluster.

To work around the jackson issue, as suggested, I have deployed the following to my config.json:

“tuningConfig” : {

“jobProperties” : {

“mapreduce.job.user.classpath.first”: “true”

}

}

Then I see (in the MR application log):

2016-08-23T11:51:31,731 ERROR [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster - Error starting MRAppMaster
java.lang.IllegalArgumentException: Invalid ContainerId: container_e3062_1471543406898_6795_01_000001
	at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:182) ~[hadoop-yarn-common-2.3.0.jar:?]
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1355) [hadoop-mapreduce-client-app-2.3.0.jar:?]
Caused by: java.lang.NumberFormatException: For input string: "e3062"
	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) ~[?:1.8.0_73]
	at java.lang.Long.parseLong(Long.java:589) ~[?:1.8.0_73]
	at java.lang.Long.parseLong(Long.java:631) ~[?:1.8.0_73]
	at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationAttemptId(ConverterUtils.java:137) ~[hadoop-yarn-common-2.3.0.jar:?]
	at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:177) ~[hadoop-yarn-common-2.3.0.jar:?]
	... 1 more
2016-08-23T11:51:31,826 INFO [main] org.apache.hadoop.util.ExitUtil - Exiting with status 1

This I interpret as use of the wrong, out of date, jar on the cluster.

Any thoughts?

Paul

Hi Paul,

The container ID issue there seems to be from hadoop 2.3.0 being unable to parse a newer container ID format, can you try:

  1. pulling Hadoop 2.7.1 dependencies:

http://druid.io/docs/latest/operations/pull-deps.html

  1. specify hadoop 2.7.1 in the hadoopDependencies of the ingestion task:

http://druid.io/docs/latest/operations/other-hadoop.html

e.g.

"hadoopDependencyCoordinates": ["org.apache.hadoop:hadoop-client:2.7.1"]

  • Jon

Perfect. Worked for me. Thank you for your quick and kind response.