druid-0.8.3 - NoClassDefFoundError: org/apache/hadoop/conf/Configuration

I am trying to start my Druid services and unfortunately getting the following exception. Any has any ideas how to address these classpath woes?

Starting Druid Overlord Service... Feb 10 18:37:26 server01 java[2474]: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration Feb 10 18:37:26 ````server01`` java[2474]: at io.druid.storage.hdfs.HdfsStorageDruidModule.configure(HdfsStorageDruidModule.java:93) Feb 10 18:37:26 ````server01`` java[2474]: at com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:230) Feb 10 18:37:26 ````server01`` java[2474]: at com.google.inject.spi.Elements.getElements(Elements.java:103) Feb 10 18:37:26 ````server01`` java[2474]: at com.google.inject.spi.Elements.getElements(Elements.java:94) Feb 10 18:37:26 ````server01`` java[2474]: at com.google.inject.util.Modules$RealOverriddenModuleBuilder$1.configure(Modules.java:173) Feb 10 18:37:26 ````server01`` java[2474]: at com.google.inject.AbstractModule.configure(AbstractModule.java:62) Feb 10 18:37:26 ````server01`` java[2474]: at com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:230) Feb 10 18:37:26 ````server01`` java[2474]: at com.google.inject.spi.Elements.getElements(Elements.java:103) Feb 10 18:37:26 ````server01`` java[2474]: at com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:136) Feb 10 18:37:26 ````server01`` java[2474]: at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:104) Feb 10 18:37:26 ````server01`` java[2474]: at com.google.inject.Guice.createInjector(Guice.java:96) Feb 10 18:37:26 ````server01`` java[2474]: at com.google.inject.Guice.createInjector(Guice.java:73) Feb 10 18:37:26 ````server01`` java[2474]: at com.google.inject.Guice.createInjector(Guice.java:62) Feb 10 18:37:26 ````server01`` java[2474]: at io.druid.initialization.Initialization.makeInjectorWithModules(Initialization.java:415) Feb 10 18:37:26 ````server01`` java[2474]: at io.druid.cli.GuiceRunnable.makeInjector(GuiceRunnable.java:55) Feb 10 18:37:26 ````server01`` java[2474]: at io.druid.cli.ServerRunnable.run(ServerRunnable.java:37) Feb 10 18:37:26 ````server01`` java[2474]: at io.druid.cli.Main.main(Main.java:99) Feb 10 18:37:26 ````server01`` java[2474]: Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration Feb 10 18:37:26 ````server01`` java[2474]: at java.net.URLClassLoader.findClass(URLClassLoader.java:381) Feb 10 18:37:26 ````server01`` java[2474]: at java.lang.ClassLoader.loadClass(ClassLoader.java:424) Feb 10 18:37:26 ````server01`` java[2474]: at java.lang.ClassLoader.loadClass(ClassLoader.java:357) Feb 10 18:37:26 ````server01`` java[2474]: ... 17 more

``

I have the following realated values set in my "druid-0.8.3/config/_common/common.runtime.properties" file

druid.extensions.coordinates=[“io.druid.extensions:postgresql-metadata-storage”,“io.druid.extensions:druid-kafka-eight”,“io.druid.extensions:druid-hdfs-storage”,“io.druid.extensions:druid-histogram”]
druid.extensions.localRepository=extensions-repo

druid.indexer.task.defaultHadoopCoordinates=[“org.apache.hadoop:hadoop-client:2.7.1”,“org.apache.hadoop:hadoop-hdfs:2.7.1”]

``

I did attempt to download my dependencies directly via the “pull-deps” command (http://druid.io/docs/latest/tutorials/firewall.html), though that was unsuccessful.

[druid]$ cd druid-0.8.3
[druid]$ java -classpath "config\_common;lib\*" io.druid.cli.Main tools pull-deps
[druid]$ Error: Could not find or load main class io.druid.cli.Main

``

So my workaround is to add the Hadoop classpath to Druid, which is different to the official documentation here ( http://druid.io/docs/latest/configuration/production-cluster.html ). My updated systemd service (/etc/systemd/system/druid-historical.service) with the Hadoop classpath is given below.
Question: Should I be adding the Hadoop classpath to my Druid node configuration?

Druid Historical Node Service for systemd:

[Unit]
Description=Druid Historical Node Service
Before=druid-broker.service
After=syslog.target network.target nss-lookup.target druid-overlord.service druid-middlemanager.service druid-coordinator.service

[Service]
Type=simple
WorkingDirectory=/opt/druid
ExecStart=/usr/bin/sh -c “/usr/bin/java -Xms4g -Xmx4g
-XX:NewSize=2g -XX:MaxNewSize=2g -XX:MaxDirectMemorySize=8g -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/opt/druid/druid-tmp -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
-classpath config/_common:config/historical:lib/*:$(/usr/hdp/current/hadoop-client/bin/hadoop classpath) io.druid.cli.Main server historical”
Restart=on-failure
LimitNOFILE=10000
User=druid

[Install]
WantedBy=multi-user.target

``

Hi Mark, what version of Hadoop do you have? Usually folks have had lukc including Hadoop jars in the classpath.

I am using Hadoop 2.7.1 (Managed by Apache Ambari).

Unfortunately since my post, I have discovered some issues with my classpath. So back to the drawing board.

Pretty sure I have it working with an updated classpath that includes the following:

  • /opt/druid/extensions-repo/org/apache/hadoop/hadoop-client/2.7.1/hadoop-client-2.7.1.jar
  • /opt/druid/extensions-repo/org/apache/hadoop/hadoop-hdfs/2.7.1/hadoop-hdfs-2.7.1.jar
  • /usr/hdp/current/hadoop-client/*
  • /usr/hdp/current/hadoop-client/lib/*
    I ran some quick tests that included using the Druid Indexer for some re-segmenting,using Tranquility for adding new values and running some Druid queries, then looking at the logs for exceptions. Hopefully this continues to work.

Just
to restate, I added the hadoop-client and hadoop-hdfs libraries on the path (As druid.indexer.task.defaultHadoopCoordinates=[“org.apache.hadoop:hadoop-client:2.7.1”,“org.apache.hadoop:hadoop-hdfs:2.7.1”])
as I am worried Druid doesn’t include them at runtime.

My updated Druid Historical Node Service for systemd:

/etc/systemd/system/druid-historical.service

[Unit]
Description=Druid Historical Service
Before=druid-broker.service
After=syslog.target network.target nss-lookup.target druid-overlord.service druid-middlemanager.service druid-coordinator.service

[Service]
Type=simple
WorkingDirectory=/opt/druid
ExecStart=/usr/bin/sh -c "/usr/bin/java -Xms4g -Xmx4g \

-XX:NewSize=2g -XX:MaxNewSize=2g -XX:MaxDirectMemorySize=8g
-XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps \

-Duser.timezone=UTC -Dfile.encoding=UTF-8
-Djava.io.tmpdir=/opt/druid/druid-tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
-classpath config/_common:config/historical:lib/*:$(/opt/druid/druid-classpath.sh) io.druid.cli.Main server historical"
Restart=on-failure
LimitNOFILE=10000
User=druid

[Install]
WantedBy=multi-user.target

``

My external Druid Classpath configuration file:

/opt/druid/druid-classpath.sh

echo
"/opt/druid/extensions-repo/org/apache/hadoop/hadoop-client/2.7.1/hadoop-client-2.7.1.jar
:/opt/druid/extensions-repo/org/apache/hadoop/hadoop-hdfs/2.7.1/hadoop-hdfs-2.7.1.jar
:/usr/hdp/current/hadoop-client/
:/usr/hdp/current/hadoop-client/lib/

"

``

So I figured out the issue that was causing these NoClassDefFoundError exceptions and resulted in me manually updating the classpath, and it has to do with corrupted Maven artifacts. Now I can run Druid without any manual additions to the classpath.

A diff between the extensions-repo of one working Druid instance and the one Druid instance at issue (Requiring a classpath modification to run), showed that certain files had different content and sizes than expected. For instance:

Files extensions-repo.bad/com/sun/jersey/jersey-project/1.9/jersey-project-1.9.pom and extensions-repo.good/com/sun/jersey/jersey-project/1.9/jersey-project-1.9.pom differ

Files extensions-repo.bad/commons-beanutils/commons-beanutils-core/1.8.0/commons-beanutils-core-1.8.0.jar and extensions-repo.good/commons-beanutils/commons-beanutils-core/1.8.0/commons-beanutils-core-1.8.0.jar differ

Files extensions-repo.bad/commons-collections/commons-collections/3.2.1/commons-collections-3.2.1.jar and extensions-repo.good/commons-collections/commons-collections/3.2.1/commons-collections-3.2.1.jar differ

Files extensions-repo.bad/commons-collections/commons-collections/3.2.1/commons-collections-3.2.1.pom and extensions-repo.good/commons-collections/commons-collections/3.2.1/commons-collections-3.2.1.pom differ

Files extensions-repo.bad/commons-collections/commons-collections/3.2.1/commons-collections-3.2.1.pom.sha1 and extensions-repo.good/commons-collections/commons-collections/3.2.1/commons-collections-3.2.1.pom.sha1 differ

Files extensions-repo.bad/commons-configuration/commons-configuration/1.6/commons-configuration-1.6.jar and extensions-repo.good/commons-configuration/commons-configuration/1.6/commons-configuration-1.6.jar differ

Files extensions-repo.bad/commons-httpclient/commons-httpclient/3.1/commons-httpclient-3.1.pom and extensions-repo.good/commons-httpclient/commons-httpclient/3.1/commons-httpclient-3.1.pom differ

Files extensions-repo.bad/javax/xml/bind/jaxb-api/2.2.2/jaxb-api-2.2.2.pom and extensions-repo.good/javax/xml/bind/jaxb-api/2.2.2/jaxb-api-2.2.2.pom differ

Files extensions-repo.bad/junit/junit/4.11/junit-4.11.pom and extensions-repo.good/junit/junit/4.11/junit-4.11.pom differ

Files extensions-repo.bad/log4j/log4j/1.2.14/log4j-1.2.14.pom.sha1 and extensions-repo.good/log4j/log4j/1.2.14/log4j-1.2.14.pom.sha1 differ

Files extensions-repo.bad/org/apache/commons/commons-math3/3.1.1/commons-math3-3.1.1.jar and extensions-repo.good/org/apache/commons/commons-math3/3.1.1/commons-math3-3.1.1.jar differ

Files extensions-repo.bad/org/apache/commons/commons-parent/9/commons-parent-9.pom and extensions-repo.good/org/apache/commons/commons-parent/9/commons-parent-9.pom differ

Files extensions-repo.bad/org/apache/hadoop/hadoop-client/2.3.0/hadoop-client-2.3.0.pom and extensions-repo.good/org/apache/hadoop/hadoop-client/2.3.0/hadoop-client-2.3.0.pom differ

Files extensions-repo.bad/org/apache/hadoop/hadoop-common/2.3.0/hadoop-common-2.3.0.jar and extensions-repo.good/org/apache/hadoop/hadoop-common/2.3.0/hadoop-common-2.3.0.jar differ

Files extensions-repo.bad/org/apache/zookeeper/zookeeper/3.4.5/zookeeper-3.4.5.jar and extensions-repo.good/org/apache/zookeeper/zookeeper/3.4.5/zookeeper-3.4.5.jar differ

Files extensions-repo.bad/org/codehaus/jackson/jackson-core-asl/1.8.8/jackson-core-asl-1.8.8.pom and extensions-repo.good/org/codehaus/jackson/jackson-core-asl/1.8.8/jackson-core-asl-1.8.8.pom differ

Files extensions-repo.bad/org/slf4j/slf4j-api/1.7.2/slf4j-api-1.7.2.pom and extensions-repo.good/org/slf4j/slf4j-api/1.7.2/slf4j-api-1.7.2.pom differ

Files extensions-repo.bad/org/slf4j/slf4j-parent/1.7.2/slf4j-parent-1.7.2.pom and extensions-repo.good/org/slf4j/slf4j-parent/1.7.2/slf4j-parent-1.7.2.pom differ

``

On inspection, it seemed that all the Maven Artifacts at issue, seemed to have a file with the suffix “.sha1-in-progress”.

$ ll

total 260

-rw-r–r--. 1 myapp myapp 240 Feb 10 18:00 aether-edaa6443-c34d-4ed8-a968-a1f177d52311-commons-beanutils-core-1.8.0.jar.sha1-in-progress

-rw-r–r--. 1 myapp myapp 246579 Feb 10 18:00 commons-beanutils-core-1.8.0.jar

-rw-r–r--. 1 myapp myapp 1639 Feb 10 17:59 commons-beanutils-core-1.8.0.pom

-rw-r–r--. 1 myapp myapp 32 Feb 10 17:59 commons-beanutils-core-1.8.0.pom.md5

-rw-r–r--. 1 myapp myapp 40 Feb 10 17:59 commons-beanutils-core-1.8.0.pom.sha1

``

The Fix: