Materialized view - com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found

Hi,

I’m trying to use the Materialized Views extension (available for Druid 0.13.0) and I’ve go this error:

2019-01-14T12:18:23,539 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.common.task.HadoopIndexTask - Encountered exception in run():
org.apache.druid.java.util.common.ISE: Hadoop dependency [/opt/druid/druid-0.13.0-bin/hadoop-dependencies/hadoop-client/2.3.0] didn’t exist!?
at org.apache.druid.initialization.Initialization.getHadoopDependencyFilesToLoad(Initialization.java:281) ~[druid-server-0.13.0-incubating.jar:0.13.0-incubating]
at org.apache.druid.indexing.common.task.HadoopTask.buildClassLoader(HadoopTask.java:158) ~[druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating]
at org.apache.druid.indexing.common.task.HadoopTask.buildClassLoader(HadoopTask.java:132) ~[druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating]
at org.apache.druid.indexing.common.task.HadoopIndexTask.runInternal(HadoopIndexTask.java:262) ~[druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating]
at org.apache.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:232) [druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating]
at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:421) [druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating]
at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:393) [druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_191]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]
2019-01-14T12:18:23,564 INFO [task-runner-0-priority-0] org.apache.druid.segment.realtime.firehose.ServiceAnnouncingChatHandlerProvider - Unregistering chat handler[index_materialized_view_test_2019-01-14T12:18:16.453Z]
2019-01-14T12:18:23,564 INFO [task-runner-0-priority-0] org.apache.druid.indexing.overlord.TaskRunnerUtils - Task [index_materialized_view_test_2019-01-14T12:18:16.453Z] status changed to [FAILED].
2019-01-14T12:18:23,571 INFO [task-runner-0-priority-0] org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
“id” : “index_materialized_view_test_2019-01-14T12:18:16.453Z”,
“status” : “FAILED”,
“duration” : 30,
“errorMsg” : “org.apache.druid.java.util.common.ISE: Hadoop dependency [/opt/druid/druid-0.13.0-bin/hadoop-depende…”
}

Because of this, I’ve run this command:

java -classpath “./lib/*” org.apache.druid.cli.Main tools pull-deps -h org.apache.hadoop:hadoop-client:2.3.0

The new error:

2019-01-14T12:53:36,504 ERROR [task-runner-0-priority-0] org.apache.druid.indexer.hadoop.DatasourceInputFormat - Exception getting splits
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1882) ~[hadoop-common-2.3.0.jar:?]
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2298) ~[hadoop-common-2.3.0.jar:?]
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2311) ~[hadoop-common-2.3.0.jar:?]
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:90) ~[hadoop-common-2.3.0.jar:?]
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2350) ~[hadoop-common-2.3.0.jar:?]
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2332) ~[hadoop-common-2.3.0.jar:?]
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:369) ~[hadoop-common-2.3.0.jar:?]
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) ~[hadoop-common-2.3.0.jar:?]
at org.apache.druid.indexer.hadoop.DatasourceInputFormat$1$1.listStatus(DatasourceInputFormat.java:165) ~[druid-indexing-hadoop-0.13.0-incubating.jar:0.13.0-incubating]
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270) ~[hadoop-mapreduce-client-core-2.3.0.jar:?]
at org.apache.druid.indexer.hadoop.DatasourceInputFormat.lambda$getLocations$2(DatasourceInputFormat.java:205) ~[druid-indexing-hadoop-0.13.0-incubating.jar:0.13.0-incubating]
at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:267) [?:1.8.0_191]
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) [?:1.8.0_191]
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) [?:1.8.0_191]
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) [?:1.8.0_191]
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) [?:1.8.0_191]
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) [?:1.8.0_191]
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) [?:1.8.0_191]
at org.apache.druid.indexer.hadoop.DatasourceInputFormat.getFrequentLocations(DatasourceInputFormat.java:228) [druid-indexing-hadoop-0.13.0-incubating.jar:0.13.0-incubating]
at org.apache.druid.indexer.hadoop.DatasourceInputFormat.toDataSourceSplit(DatasourceInputFormat.java:186) [druid-indexing-hadoop-0.13.0-incubating.jar:0.13.0-incubating]
at org.apache.druid.indexer.hadoop.DatasourceInputFormat.getSplits(DatasourceInputFormat.java:115) [druid-indexing-hadoop-0.13.0-incubating.jar:0.13.0-incubating]
at org.apache.hadoop.mapreduce.lib.input.DelegatingInputFormat.getSplits(DelegatingInputFormat.java:115) [hadoop-mapreduce-client-core-2.3.0.jar:?]
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:493) [hadoop-mapreduce-client-core-2.3.0.jar:?]
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:510) [hadoop-mapreduce-client-core-2.3.0.jar:?]
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394) [hadoop-mapreduce-client-core-2.3.0.jar:?]
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) [hadoop-mapreduce-client-core-2.3.0.jar:?]
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) [hadoop-mapreduce-client-core-2.3.0.jar:?]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_191]
at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_191]
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) [hadoop-common-2.3.0.jar:?]
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) [hadoop-mapreduce-client-core-2.3.0.jar:?]
at org.apache.druid.indexer.IndexGeneratorJob.run(IndexGeneratorJob.java:206) [druid-indexing-hadoop-0.13.0-incubating.jar:0.13.0-incubating]
at org.apache.druid.indexer.JobHelper.runJobs(JobHelper.java:376) [druid-indexing-hadoop-0.13.0-incubating.jar:0.13.0-incubating]
at org.apache.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:96) [druid-indexing-hadoop-0.13.0-incubating.jar:0.13.0-incubating]
at org.apache.druid.indexing.common.task.HadoopIndexTask$HadoopIndexGeneratorInnerProcessingRunner.runTask(HadoopIndexTask.java:612) [druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_191]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_191]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_191]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_191]
at org.apache.druid.indexing.common.task.HadoopIndexTask.runInternal(HadoopIndexTask.java:380) [druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating]
at org.apache.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:232) [druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating]
at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:421) [druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating]
at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:393) [druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_191]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]
Caused by: java.lang.ClassNotFoundException: Class com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1788) ~[hadoop-common-2.3.0.jar:?]
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1880) ~[hadoop-common-2.3.0.jar:?]

Do you know why is asking for hadoop-client 2.3.0 (because there is a newer version in the classpath) and what I have to do in order to fix the last error?

Thank you and congrats for the latest release!

Mihai

I have opened an issue.