Hadoop Index Task fails with ClassNotFoundException when trying to use Azure Data Lake store

Hi,

I have druid setup to use a hadoop cluster (HDInsights) hosted in Azure. It works fine when I use Azure blob as storage, but recently I wanted to try out using Azure Data Lake Store instead.

When I submit a index job I get “java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.adl.HdiAdlFileSystem not found” in the log.

Any help to get this working would be great!

This cluster is using druid 0.9.1.1.

Details:
I have copied the cluster *.xml files to my druid middle manager node. I also downloaded azure-data-lake-store-sdk.jar and put it in my hadoop_dependencies folder on the middle manager node. I should also add that the data I want to index and druid deep storage is on Azure Blob. In the log I can see that it finds my data files correctly as shown in the logs here:
2017-11-29T16:20:49,841 INFO [task-runner-0-priority-0] io.druid.indexer.path.GranularityPathSpec - Checking path[wasb://data@myaccount.blob.core.windows.net/sandbox/2017/11/29/15]
2017-11-29T16:20:50,017 INFO [task-runner-0-priority-0] io.druid.indexer.path.GranularityPathSpec - Appending path [wasb://data@myaccount.blob.core.windows.net/sandbox/2017/11/29/15/-1824167840_ac5a4be1a7a141a4a30f98bcaadb4dfa_1.avro]

I can see in the log that it does something (loads it?) with the data lake sdk:
2017-11-29T16:20:47,436 INFO [task-runner-0-priority-0] io.druid.initialization.Initialization - added URL[file:/opt/druid/hadoop-dependencies/hadoop-client/2.7.3/azure-data-lake-store-sdk-2.3.0-preview2.jar]

But then later it prints an exception:
2017-11-29T16:20:50,062 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_bfevents_sandbox_2017-11-29T16:20:44.146Z, type=index_hadoop, dataSource=bfevents_sandbox}]
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:204) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:175) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid-indexing-service-0.9.1.1.jar:0.9.1.1]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid-indexing-service-0.9.1.1.jar:0.9.1.1]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_121]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_121]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_121]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_121]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_121]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_121]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_121]
at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:201) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
… 7 more
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.adl.HdiAdlFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195) ~[?:?]
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2654) ~[?:?]
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667) ~[?:?]
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) ~[?:?]
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) ~[?:?]
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) ~[?:?]
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) ~[?:?]
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:172) ~[?:?]
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:357) ~[?:?]
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) ~[?:?]
at io.druid.indexer.JobHelper.setupClasspath(JobHelper.java:122) ~[druid-indexing-hadoop-0.9.1.1.jar:0.9.1.2-SNAPSHOT]
at io.druid.indexer.DetermineHashedPartitionsJob.run(DetermineHashedPartitionsJob.java:106) ~[druid-indexing-hadoop-0.9.1.1.jar:0.9.1.2-SNAPSHOT]
at io.druid.indexer.JobHelper.runJobs(JobHelper.java:323) ~[druid-indexing-hadoop-0.9.1.1.jar:0.9.1.2-SNAPSHOT]
at io.druid.indexer.HadoopDruidDetermineConfigurationJob.run(HadoopDruidDetermineConfigurationJob.java:91) ~[druid-indexing-hadoop-0.9.1.1.jar:0.9.1.2-SNAPSHOT]
at io.druid.indexing.common.task.HadoopIndexTask$HadoopDetermineConfigInnerProcessing.runTask(HadoopIndexTask.java:291) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_121]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_121]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_121]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_121]
at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:201) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
… 7 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.adl.HdiAdlFileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101) ~[?:?]
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193) ~[?:?]
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2654) ~[?:?]
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667) ~[?:?]
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) ~[?:?]
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) ~[?:?]
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) ~[?:?]
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) ~[?:?]
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:172) ~[?:?]
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:357) ~[?:?]
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) ~[?:?]
at io.druid.indexer.JobHelper.setupClasspath(JobHelper.java:122) ~[druid-indexing-hadoop-0.9.1.1.jar:0.9.1.2-SNAPSHOT]
at io.druid.indexer.DetermineHashedPartitionsJob.run(DetermineHashedPartitionsJob.java:106) ~[druid-indexing-hadoop-0.9.1.1.jar:0.9.1.2-SNAPSHOT]
at io.druid.indexer.JobHelper.runJobs(JobHelper.java:323) ~[druid-indexing-hadoop-0.9.1.1.jar:0.9.1.2-SNAPSHOT]
at io.druid.indexer.HadoopDruidDetermineConfigurationJob.run(HadoopDruidDetermineConfigurationJob.java:91) ~[druid-indexing-hadoop-0.9.1.1.jar:0.9.1.2-SNAPSHOT]
at io.druid.indexing.common.task.HadoopIndexTask$HadoopDetermineConfigInnerProcessing.runTask(HadoopIndexTask.java:291) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_121]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_121]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_121]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_121]
at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:201) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
… 7 more

Thanks!
Victor