Hadoop Index tasks failing with java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.Job

Hi!

We are trying to update our small development Druid cluster from 0.13.0-incubating to 0.14.1-incubating but cannot get the Hadoop indexing tasks running. The Peons all fail with the following exception:

2019-05-20T09:10:49,506 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.common.task.HadoopIndexTask - Got invocation target exception in run(), cause:
java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/Job
at java.lang.Class.getDeclaredMethods0(Native Method) ~[?:1.8.0_51]
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) ~[?:1.8.0_51]
at java.lang.Class.getDeclaredMethods(Class.java:1975) ~[?:1.8.0_51]
at com.fasterxml.jackson.databind.introspect.AnnotatedClass._findClassMethods(AnnotatedClass.java:1053) ~[jackson-databind-2.6.7.jar:2.6.7]
at com.fasterxml.jackson.databind.introspect.AnnotatedClass._addMemberMethods(AnnotatedClass.java:605) ~[jackson-databind-2.6.7.jar:2.6.7]
at com.fasterxml.jackson.databind.introspect.AnnotatedClass.resolveMemberMethods(AnnotatedClass.java:429) ~[jackson-databind-2.6.7.jar:2.6.7]
at com.fasterxml.jackson.databind.introspect.AnnotatedClass.memberMethods(AnnotatedClass.java:253) ~[jackson-databind-2.6.7.jar:2.6.7]
at com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector._addMethods(POJOPropertiesCollector.java:477) ~[jackson-databind-2.6.7.jar:2.6.7]
at com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector.collectAll(POJOPropertiesCollector.java:284) ~[jackson-databind-2.6.7.jar:2.6.7]
at com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector.getPropertyMap(POJOPropertiesCollector.java:248) ~[jackson-databind-2.6.7.jar:2.6.7]
at com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector.getProperties(POJOPropertiesCollector.java:155) ~[jackson-databind-2.6.7.jar:2.6.7]
at com.fasterxml.jackson.databind.introspect.BasicBeanDescription._properties(BasicBeanDescription.java:142) ~[jackson-databind-2.6.7.jar:2.6.7]
at com.fasterxml.jackson.databind.introspect.BasicBeanDescription.findProperties(BasicBeanDescription.java:217) ~[jackson-databind-2.6.7.jar:2.6.7]
at com.fasterxml.jackson.databind.deser.BasicDeserializerFactory._findCreatorsFromProperties(BasicDeserializerFactory.java:333) ~[jackson-databind-2.6.7.jar:2.6.7]
at com.fasterxml.jackson.databind.deser.BasicDeserializerFactory._constructDefaultValueInstantiator(BasicDeserializerFactory.java:315) ~[jackson-databind-2.6.7.jar:2.6.7]
at com.fasterxml.jackson.databind.deser.BasicDeserializerFactory.findValueInstantiator(BasicDeserializerFactory.java:254) ~[jackson-databind-2.6.7.jar:2.6.7]
at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.buildBeanDeserializer(BeanDeserializerFactory.java:222) ~[jackson-databind-2.6.7.jar:2.6.7]
at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.createBeanDeserializer(BeanDeserializerFactory.java:142) ~[jackson-databind-2.6.7.jar:2.6.7]
at com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer2(DeserializerCache.java:403) ~[jackson-databind-2.6.7.jar:2.6.7]
at com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer(DeserializerCache.java:352) ~[jackson-databind-2.6.7.jar:2.6.7]
at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:264) ~[jackson-databind-2.6.7.jar:2.6.7]
at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:244) ~[jackson-databind-2.6.7.jar:2.6.7]
at com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:142) ~[jackson-databind-2.6.7.jar:2.6.7]
at com.fasterxml.jackson.databind.DeserializationContext.findContextualValueDeserializer(DeserializationContext.java:428) ~[jackson-databind-2.6.7.jar:2.6.7]
at com.fasterxml.jackson.databind.jsontype.impl.TypeDeserializerBase._findDeserializer(TypeDeserializerBase.java:179) ~[jackson-databind-2.6.7.jar:2.6.7]
at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId(AsPropertyTypeDeserializer.java:108) ~[jackson-databind-2.6.7.jar:2.6.7]
at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:93) ~[jackson-databind-2.6.7.jar:2.6.7]
at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromAny(AsPropertyTypeDeserializer.java:165) ~[jackson-databind-2.6.7.jar:2.6.7]
at com.fasterxml.jackson.databind.JsonDeserializer.deserializeWithType(JsonDeserializer.java:149) ~[jackson-databind-2.6.7.jar:2.6.7]
at com.fasterxml.jackson.databind.deser.impl.TypeWrappedDeserializer.deserialize(TypeWrappedDeserializer.java:42) ~[jackson-databind-2.6.7.jar:2.6.7]
at com.fasterxml.jackson.databind.ObjectMapper._convert(ObjectMapper.java:3454) ~[jackson-databind-2.6.7.jar:2.6.7]
at com.fasterxml.jackson.databind.ObjectMapper.convertValue(ObjectMapper.java:3378) ~[jackson-databind-2.6.7.jar:2.6.7]
at org.apache.druid.indexer.HadoopDruidIndexerConfig.(HadoopDruidIndexerConfig.java:227) ~[druid-indexing-hadoop-0.14.1-incubating.jar:0.14.1-incubating]
at org.apache.druid.indexer.HadoopDruidIndexerConfig.fromSpec(HadoopDruidIndexerConfig.java:140) ~[druid-indexing-hadoop-0.14.1-incubating.jar:0.14.1-incubating]
at org.apache.druid.indexing.common.task.HadoopIndexTask$HadoopDetermineConfigInnerProcessingRunner.runTask(HadoopIndexTask.java:607) ~[druid-indexing-service-0.14.1-incubating.jar:0.14.1-incubating]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_51]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_51]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_51]
at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_51]
at org.apache.druid.indexing.common.task.HadoopIndexTask.runInternal(HadoopIndexTask.java:309) ~[druid-indexing-service-0.14.1-incubating.jar:0.14.1-incubating]
at org.apache.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:244) [druid-indexing-service-0.14.1-incubating.jar:0.14.1-incubating]
at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:419) [druid-indexing-service-0.14.1-incubating.jar:0.14.1-incubating]
at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:391) [druid-indexing-service-0.14.1-incubating.jar:0.14.1-incubating]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_51]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_51]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_51]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_51]
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.Job
at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[?:1.8.0_51]
at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_51]
at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_51]
… 47 more

``

We are using a self-compiled version with Cloudera Hadoop 2.6.0-cdh5.14.0 (as described in http://druid.io/docs/latest/operations/other-hadoop.html) which worked fine for the last years. Deep Storage is in HDFS if this is important.

The MiddleManager runtime.properies look like:

druid.service=druid/middleManager
druid.plaintextPort=8082

This is not the full list of Druid extensions, but common ones that people often use. You may need to change this list

based on your particular setup.

druid.extensions.loadList=[“druid-hdfs-storage”, “mysql-metadata-storage”]

Number of tasks per middleManager

druid.worker.capacity=1

Task launch parameters

druid.indexer.runner.javaOpts=-server -Xmx2g -XX:MaxDirectMemorySize=2560m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -Dhadoop.mapreduce.job.user.classpath.first=true -Djava.io.tmpdir=/var/druid/tmp -Dnode.type=peon
druid.indexer.task.baseTaskDir=/var/druid/task

HTTP server threads

druid.server.http.numThreads=25

Processing threads and buffers on Peons

druid.indexer.fork.property.druid.monitoring.monitors=
druid.indexer.fork.property.druid.processing.buffer.sizeBytes=536870912
druid.indexer.fork.property.druid.processing.numThreads=2
druid.indexer.fork.property.hadoop.mapreduce.map.java.opts=-Duser.timezone=UTC
druid.indexer.fork.property.hadoop.mapreduce.reduce.java.opts=-Duser.timezone=UTC
druid.indexer.fork.property.druid.extensions.loadList=[“druid-hdfs-storage”, “mysql-metadata-storage”]

Hadoop indexing

druid.indexer.task.hadoopWorkingPath=/var/druid/hadoop-tmp
druid.indexer.task.defaultHadoopCoordinates=[“org.apache.hadoop:hadoop-client:2.6.0-cdh5.14.0”]

``

We also tried other classpath options like mapreduce.job.classloader = true but had no success with it.

There was a bug which looks similar but was solved in version 0.14.0: https://github.com/apache/incubator-druid/issues/6967

Can anybody help?

Thanks, Christian

Doesn’t anyone have an idea?

Hi Christian,

just in case, would you please double check that the classpath of the task contains all necessary configuration files for Hadoop?

I think it should be printed in the task log which starts with “Hadoop Container Druid Classpath is set to”.

Jihoon

Sorry for coming back to you so late. We postponed the Druid update in May but I am getting back to it now.

I choose Druid 0.15.1 now and the error is still the same BUT you answer already helped: The classpath is different now. It is missing the Jar containing the missing Job class (and some others Jars too):

…/apache-druid-0.13.0-incubating/extensions/druid-hdfs-storage/hadoop-mapreduce-client-core-2.6.0-cdh5.14.0.jar

``

I am looking further now. Maybe it has something to do with my custom Hadoop version which I cannot change? Maybe some dependencies changed somewhere? I am also not sure if the hadoop-mapreduce-client-core should be incoming from the druid-hdfs-storage plugin.

Okay, I guess we are affected by this bug: https://github.com/apache/incubator-druid/pull/8339