druid 0.9.1.1 indexing with hadoop 2.7.3 (and maybe with 2.6)

Hi,

Like many other folks I’ve had a challenging time getting druid indexing working with other versions of hadoop. There always seems to be some form of jackson conflict and a lot of discussion about potentially improving class loader isolation.

I started poking around and was surprised to find out that you can uses a separate class loader in hadoop which was introduced in hadoop 2.6 according to the tickets. I managed to get Hadoop 2.7.3 and Druid 0.9.1.1 working just by doing the following in the indexing task:

“hadoopCoordinates”: “org.apache.hadoop:hadoop-client:2.7.3”,

And in the tuningConfig:

“jobProperties”: {

“mapreduce.job.classloader”: “true”,

“mapreduce.job.classloader.system.classes”: “-javax.validation.,java.,javax.,org.apache.commons.logging.,org.apache.log4j.,org.apache.hadoop.”

}

This should mean that the indexing job now works in an isolated class loader. The property “mapreduce.job.classloader.system.classes” is the default value with “-javax.validation.” prepended. That was all that was necessary to make it go.

If some other folks can give this a try and it works we could get this added to https://github.com/druid-io/druid/blob/master/docs/content/operations/other-hadoop.md

Cheers

Mark

Hey Mark,

Did you need the mapreduce.job.classloader.system.classes part; did things fail without it? If so what error did you get?

So far in my experience, “mapreduce.job.classloader”: “true” by itself is generally enough to get Druid working on a variety of Hadoop versions (and this is called out in other-hadoop.md). But if you found a situation where that’s not enough then it’d be good for us to document that. A PR to other-hadoop.md would be welcome with that info!

I got the exception below.

Error: java.lang.ClassNotFoundException: javax.validation.Validator

I assume Validator is not part of the vanilla apache hadoop install, or perhaps the java install. I’m using java version “1.8.0_101”

hmmm

The validator is imported by api/src/main/java/io/druid/guice/JsonConfigurator.java

I thought maybe I was using an extension that I was using required the class so not everyone would see the issue. Since this is in the api I’m surprised more people haven’t run into the problem.

Puzzling.

It’s about time I figure out the in and outs of git pull requests so I’ll put something together in the next couple of days.

Hi Mark, Gian,
Did you resolve the problem?

I am having the same too but I am using hadoop 2.7.2. My task just failed all the time.

Total time for which application threads were stopped: 0.0006047 seconds, Stopping threads took: 0.0001235 seconds 2016-10-24T04:10:56,950 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Task Id : attempt_1477229142752_0013_m_000011_0, Status : FAILED Error: com.google.inject.util.Types.collectionOf(Ljava/lang/reflect/Type;)Ljava/lang/reflect/ParameterizedType; 2016-10-24T04:10:56,967 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Task Id : attempt_1477229142752_0013_m_000004_0, Status : FAILED Error: com.google.inject.util.Types.collectionOf(Ljava/lang/reflect/Type;)Ljava/lang/reflect/ParameterizedType; 2016-10-24T04:10:56,968 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Task Id : attempt_1477229142752_0013_m_000006_0, Status : FAILED Error: com.google.inject.util.Types.collectionOf(Ljava/lang/reflect/Type;)Ljava/lang/reflect/ParameterizedType; 2016-10-24T04:10:56,971 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Task Id : attempt_1477229142752_0013_m_000001_0, Status : FAILED Error: com.google.inject.util.Types.collectionOf(Ljava/lang/reflect/Type;)Ljava/lang/reflect/ParameterizedType; 2016-10-24T04:10:56,971 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Task Id : attempt_1477229142752_0013_m_000002_0, Status : FAILED Error: com.google.inject.util.Types.collectionOf(Ljava/lang/reflect/Type;)Ljava/lang/reflect/ParameterizedType; 2016-10-24T04:10:56,972 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Task Id : attempt_1477229142752_0013_m_000009_0, Status : FAILED Error: com.google.inject.util.Types.collectionOf(Ljava/lang/reflect/Type;)Ljava/lang/reflect/ParameterizedType;

``

I tried both

"hadoopDependencyCoordinates": [
  "org.apache.hadoop:hadoop-client:2.7.2"
]
and set druid.indexer.task.defaultHadoopCoordinates: ["org.apache.hadoop:hadoop-client:2.7.2"] in middle manager.

Regards,
Chanh

Hi guys,

I added more detail I am using Druid 0.9.2-rc2
More detail I post the peon task runner and full logs

java -cp /build/etl/imply-middlemanager-1.3.0/conf/druid/_common:/build/etl/imply-middlemanager-1.3.0/conf/druid/middleManager:dist/druid/lib/guice-multibindings-4.1.0.jar:dist/druid/lib/aws-java-sdk-sqs-1.10.21.jar:dist/druid/lib/commons-pool2-2.2.jar:dist/druid/lib/opencsv-2.3.jar:dist/druid/lib/bytebuffer-collections-0.2.5.jar:dist/druid/lib/jackson-core-2.4.6.jar:dist/druid/lib/aws-java-sdk-importexport-1.10.21.jar:dist/druid/lib/jetty-continuation-9.2.5.v20141112.jar:dist/druid/lib/log4j-api-2.5.jar:dist/druid/lib/wagon-provider-api-2.4.jar:dist/druid/lib/curator-framework-2.11.0.jar:dist/druid/lib/curator-x-discovery-2.11.0.jar:dist/druid/lib/jetty-http-9.2.5.v20141112.jar:dist/druid/lib/aws-java-sdk-cloudsearch-1.10.21.jar:dist/druid/lib/aws-java-sdk-swf-libraries-1.10.21.jar:dist/druid/lib/netty-3.10.4.Final.jar:dist/druid/lib/compress-lzf-1.0.3.jar:dist/druid/lib/irc-api-1.0-0014.jar:dist/druid/lib/config-magic-0.9.jar:dist/druid/lib/google-http-client-jackson2-1.15.0-rc.jar:dist/druid/lib/lz4-1.3.0.jar:dist/druid/lib/jcl-over-slf4j-1.7.12.jar:dist/druid/lib/aws-java-sdk-kinesis-1.10.21.jar:dist/druid/lib/druid-console-0.0.2.jar:dist/druid/lib/log4j-jul-2.5.jar:dist/druid/lib/commons-logging-1.1.1.jar:dist/druid/lib/aws-java-sdk-cloudtrail-1.10.21.jar:dist/druid/lib/commons-math3-3.6.1.jar:dist/druid/lib/maven-model-3.1.1.jar:dist/druid/lib/druid-server-0.9.2-rc2-SNAPSHOT.jar:dist/druid/lib/jackson-datatype-guava-2.4.6.jar:dist/druid/lib/annotations-2.0.3.jar:dist/druid/lib/guava-16.0.1.jar:dist/druid/lib/jsr311-api-1.1.1.jar:dist/druid/lib/aws-java-sdk-cloudfront-1.10.21.jar:dist/druid/lib/aws-java-sdk-datapipeline-1.10.21.jar:dist/druid/lib/classmate-1.0.0.jar:dist/druid/lib/aws-java-sdk-config-1.10.21.jar:dist/druid/lib/jsr305-2.0.1.jar:dist/druid/lib/log4j-slf4j-impl-2.5.jar:dist/druid/lib/maven-settings-3.1.1.jar:dist/druid/lib/javax.servlet-api-3.1.0.jar:dist/druid/lib/aws-java-sdk-cloudformation-1.10.21.jar:dist/druid/lib/json-path-2.1.0.jar:dist/druid/lib/curator-recipes-2.11.0.jar:dist/druid/lib/httpcore-4.4.3.jar:dist/druid/lib/jersey-server-1.19.jar:dist/druid/lib/hibernate-validator-5.1.3.Final.jar:dist/druid/lib/commons-codec-1.7.jar:dist/druid/lib/extendedset-1.3.10.jar:dist/druid/lib/disruptor-3.3.0.jar:dist/druid/lib/curator-client-2.11.0.jar:dist/druid/lib/zookeeper-3.4.9.jar:dist/druid/lib/plexus-utils-3.0.15.jar:dist/druid/lib/maven-model-builder-3.1.1.jar:dist/druid/lib/jetty-servlets-9.2.5.v20141112.jar:dist/druid/lib/druid-processing-0.9.2-rc2-SNAPSHOT.jar:dist/druid/lib/mapdb-1.0.8.jar:dist/druid/lib/jdbi-2.63.1.jar:dist/druid/lib/aws-java-sdk-redshift-1.10.21.jar:dist/druid/lib/http-client-1.0.4.jar:dist/druid/lib/jersey-servlet-1.19.jar:dist/druid/lib/aws-java-sdk-cognitoidentity-1.10.21.jar:dist/druid/lib/protobuf-java-2.5.0.jar:dist/druid/lib/aws-java-sdk-simpleworkflow-1.10.21.jar:dist/druid/lib/server-metrics-0.2.8.jar:dist/druid/lib/plexus-interpolation-1.19.jar:dist/druid/lib/icu4j-4.8.1.jar:dist/druid/lib/emitter-0.3.6.jar:dist/druid/lib/jackson-dataformat-smile-2.4.6.jar:dist/druid/lib/joda-time-2.8.2.jar:dist/druid/lib/jersey-guice-1.19.jar:dist/druid/lib/jackson-datatype-joda-2.4.6.jar:dist/druid/lib/aws-java-sdk-iam-1.10.21.jar:dist/druid/lib/aws-java-sdk-s3-1.10.21.jar:dist/druid/lib/aws-java-sdk-1.10.21.jar:dist/druid/lib/commons-pool-1.6.jar:dist/druid/lib/aws-java-sdk-directconnect-1.10.21.jar:dist/druid/lib/jetty-servlet-9.2.5.v20141112.jar:dist/druid/lib/druid-indexing-hadoop-0.9.2-rc2-SNAPSHOT.jar:dist/druid/lib/aws-java-sdk-elasticloadbalancing-1.10.21.jar:dist/druid/lib/aws-java-sdk-elastictranscoder-1.10.21.jar:dist/druid/lib/aws-java-sdk-directory-1.10.21.jar:dist/druid/lib/derbyclient-10.11.1.1.jar:dist/druid/lib/antlr4-runtime-4.5.1.jar:dist/druid/lib/druid-common-0.9.2-rc2-SNAPSHOT.jar:dist/druid/lib/aws-java-sdk-elasticbeanstalk-1.10.21.jar:dist/druid/lib/aws-java-sdk-route53-1.10.21.jar:dist/druid/lib/jackson-annotations-2.4.6.jar:dist/druid/lib/okhttp-1.0.2.jar:dist/druid/lib/aws-java-sdk-support-1.10.21.jar:dist/druid/lib/httpclient-4.5.1.jar:dist/druid/lib/commons-lang-2.6.jar:dist/druid/lib/commons-cli-1.2.jar:dist/druid/lib/jetty-client-9.2.5.v20141112.jar:dist/druid/lib/aether-impl-0.9.0.M2.jar:dist/druid/lib/aws-java-sdk-efs-1.10.21.jar:dist/druid/lib/slf4j-api-1.7.12.jar:dist/druid/lib/java-xmlbuilder-1.1.jar:dist/druid/lib/aether-connector-okhttp-0.0.9.jar:dist/druid/lib/jackson-jaxrs-smile-provider-2.4.6.jar:dist/druid/lib/aws-java-sdk-rds-1.10.21.jar:dist/druid/lib/jackson-jaxrs-base-2.4.6.jar:dist/druid/lib/jetty-server-9.2.5.v20141112.jar:dist/druid/lib/activation-1.1.1.jar:dist/druid/lib/aether-util-0.9.0.M2.jar:dist/druid/lib/aws-java-sdk-cloudwatchmetrics-1.10.21.jar:dist/druid/lib/geoip2-0.4.0.jar:dist/druid/lib/base64-2.3.8.jar:dist/druid/lib/jets3t-0.9.4.jar:dist/druid/lib/maven-aether-provider-3.1.1.jar:dist/druid/lib/aws-java-sdk-codedeploy-1.10.21.jar:dist/druid/lib/aws-java-sdk-autoscaling-1.10.21.jar:dist/druid/lib/guice-4.1.0.jar:dist/druid/lib/jackson-module-jaxb-annotations-2.4.6.jar:dist/druid/lib/aws-java-sdk-kms-1.10.21.jar:dist/druid/lib/java-util-0.27.10.jar:dist/druid/lib/jetty-util-9.2.5.v20141112.jar:dist/druid/lib/jetty-io-9.2.5.v20141112.jar:dist/druid/lib/maxminddb-0.2.0.jar:dist/druid/lib/javax.inject-1.jar:dist/druid/lib/jersey-core-1.19.jar:dist/druid/lib/jboss-logging-3.1.3.GA.jar:dist/druid/lib/aws-java-sdk-ssm-1.10.21.jar:dist/druid/lib/druid-indexing-service-0.9.2-rc2-SNAPSHOT.jar:dist/druid/lib/jackson-databind-2.4.6.jar:dist/druid/lib/spymemcached-2.11.7.jar:dist/druid/lib/druid-api-0.9.2-rc2-SNAPSHOT.jar:dist/druid/lib/aws-java-sdk-emr-1.10.21.jar:dist/druid/lib/aws-java-sdk-logs-1.10.21.jar:dist/druid/lib/aether-spi-0.9.0.M2.jar:dist/druid/lib/aws-java-sdk-dynamodb-1.10.21.jar:dist/druid/lib/aws-java-sdk-machinelearning-1.10.21.jar:dist/druid/lib/bcprov-jdk15on-1.52.jar:dist/druid/lib/guice-servlet-4.1.0.jar:dist/druid/lib/jetty-proxy-9.2.5.v20141112.jar:dist/druid/lib/maven-repository-metadata-3.1.1.jar:dist/druid/lib/jackson-core-asl-1.9.13.jar:dist/druid/lib/jackson-mapper-asl-1.9.13.jar:dist/druid/lib/rhino-1.7R5.jar:dist/druid/lib/aws-java-sdk-lambda-1.10.21.jar:dist/druid/lib/aws-java-sdk-ses-1.10.21.jar:dist/druid/lib/aether-api-0.9.0.M2.jar:dist/druid/lib/aws-java-sdk-ec2-1.10.21.jar:dist/druid/lib/RoaringBitmap-0.5.18.jar:dist/druid/lib/jetty-security-9.2.5.v20141112.jar:dist/druid/lib/javax.el-3.0.0.jar:dist/druid/lib/aws-java-sdk-codepipeline-1.10.21.jar:dist/druid/lib/aws-java-sdk-cloudhsm-1.10.21.jar:dist/druid/lib/derby-10.11.1.1.jar:dist/druid/lib/log4j-1.2-api-2.5.jar:dist/druid/lib/aws-java-sdk-storagegateway-1.10.21.jar:dist/druid/lib/javax.el-api-3.0.0.jar:dist/druid/lib/aws-java-sdk-devicefarm-1.10.21.jar:dist/druid/lib/aws-java-sdk-cloudwatch-1.10.21.jar:dist/druid/lib/aopalliance-1.0.jar:dist/druid/lib/aws-java-sdk-ecs-1.10.21.jar:dist/druid/lib/druid-aws-common-0.9.2-rc2-SNAPSHOT.jar:dist/druid/lib/maven-settings-builder-3.1.1.jar:dist/druid/lib/aws-java-sdk-workspaces-1.10.21.jar:dist/druid/lib/aws-java-sdk-opsworks-1.10.21.jar:dist/druid/lib/jline-0.9.94.jar:dist/druid/lib/aws-java-sdk-sts-1.10.21.jar:dist/druid/lib/druid-services-0.9.2-rc2-SNAPSHOT.jar:dist/druid/lib/log4j-core-2.5.jar:dist/druid/lib/aws-java-sdk-cognitosync-1.10.21.jar:dist/druid/lib/aws-java-sdk-simpledb-1.10.21.jar:dist/druid/lib/aws-java-sdk-glacier-1.10.21.jar:dist/druid/lib/tesla-aether-0.0.5.jar:dist/druid/lib/jackson-jaxrs-json-provider-2.4.6.jar:dist/druid/lib/commons-dbcp2-2.0.1.jar:dist/druid/lib/aether-connector-file-0.9.0.M2.jar:dist/druid/lib/validation-api-1.1.0.Final.jar:dist/druid/lib/aws-java-sdk-core-1.10.21.jar:dist/druid/lib/airline-0.7.jar:dist/druid/lib/aws-java-sdk-sns-1.10.21.jar:dist/druid/lib/derbynet-10.11.1.1.jar:dist/druid/lib/commons-io-2.4.jar:dist/druid/lib/aws-java-sdk-elasticache-1.10.21.jar:dist/druid/lib/aws-java-sdk-codecommit-1.10.21.jar -server -Xmx5g -Xms5g -XX:NewSize=1g -XX:MaxDirectMemorySize=10g -XX:+UseConcMarkSweepGC -XX:+UseStringDeduplication -XX:MaxGCPauseMillis=300 -XX:ParallelGCThreads=4 -XX:ConcGCThreads=2 -XX:InitiatingHeapOccupancyPercent=85 -verbosegc -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCApplicationStoppedTime -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 -XX:+SafepointTimeout -XX:SafepointTimeoutDelay=500 -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -Ddruid.indexer.task.baseTaskDir=var/druid/task -Ddruid.metadata.storage.connector.password=druid -Ddruid.indexer.fork.property.druid.monitoring.monitors=[“com.metamx.metrics.JvmMonitor”] -Ddruid.indexer.fork.property.druid.processing.numThreads=3 -Ddruid.emitter.logging.logLevel=info -Ddruid.indexer.fork.property.druid.server.http.numThreads=50 -Ddruid.emitter=logging -Ddruid.indexer.fork.property.druid.processing.buffer.sizeBytes=50000000 -Ddruid.indexer.task.restoreTasksOnRestart=true -Duser.timezone=UTC -Dfile.encoding.pkg=sun.io -Ddruid.storage.storageDirectory=hdfs://10.199.0.19:9000/druid/segments -Ddruid.selectors.coordinator.serviceName=druid/coordinator -Ddruid.extensions.directory=dist/druid/extensions -Ddruid.selectors.indexing.serviceName=druid/overlord -Ddruid.port=8091 -Ddruid.worker.capacity=25 -Ddruid.extensions.hadoopDependenciesDir=dist/druid/hadoop-dependencies -Ddruid.service=druid/middlemanager -Ddruid.metadata.storage.connector.user=druid -Ddruid.metadata.storage.type=postgresql -Ddruid.metadata.storage.connector.connectURI=jdbc:postgresql://10.199.0.19:5432/druid -Djava.io.tmpdir=var/tmp -Ddruid.extensions.loadList=[“druid-spark-batch”, “druid-hdfs-storage”,“postgresql-metadata-storage”,“druid-kafka-indexing-service”] -Ddruid.startup.logging.logProperties=true -Ddruid.zk.service.host=zoo1:2182,zoo1:2183,zoo2:2182 -Ddruid.monitoring.monitors=[“com.metamx.metrics.JvmMonitor”] -Ddruid.indexer.logs.directory=hdfs://10.199.0.19:9000/druid/indexing-logs -Ddruid.zk.paths.base=/druid/prod -Dfile.encoding=UTF-8 -Ddruid.indexer.task.defaultHadoopCoordinates=[“org.apache.hadoop:hadoop-client:2.7.2”] -Ddruid.storage.type=hdfs -Ddruid.indexer.task.hadoopWorkingPath=var/druid/hadoop-tmp -Ddruid.indexer.logs.type=hdfs -Ddruid.monitoring.monitors=[“com.metamx.metrics.JvmMonitor”] -Ddruid.processing.numThreads=3 -Ddruid.server.http.numThreads=50 -Ddruid.processing.buffer.sizeBytes=50000000 -Ddruid.metrics.emitter.dimension.dataSource=ad_statistics_history -Ddruid.metrics.emitter.dimension.taskId=index_hadoop_ad_statistics_history_2016-10-24T04:16:33.823Z -Ddruid.host=adx-pro-vdchcm-ants20-ip-10-199-0-20 -Ddruid.port=8106 io.druid.cli.Main internal peon var/druid/task/index_hadoop_ad_statistics_history_2016-10-24T04:16:33.823Z/task.json var/druid/task/index_hadoop_ad_statistics_history_2016-10-24T04:16:33.823Z/c9294e11-8cb2-4f72-bba0-082f4beaf9a5/status.json

``

fulllog.txt (2.65 MB)

druid.io/docs/0.9.2-rc1/operations/other-hadoop.html has updated workarounds for different versions of Hadoop

Hi Mark,

Thanks for the tip! I’ve added a section to the “other hadoop” docs with your example in this PR:

https://github.com/druid-io/druid/pull/3706

  • Jon