Re: [druid-user] mapreduce job can't found lobrary from classpath

You may want to look on the hadoop cluster to see if there are any errors there. Have you ever ingested Hadoop data successfully into this cluster?

To configure Druid for running ingestion tasks on a Hadoop cluster:

  • Update druid.indexer.task.hadoopWorkingPath in conf/druid/middleManager/ to a path on HDFS that you’d like to use for temporary files required during the indexing process. druid.indexer.task.hadoopWorkingPath=/tmp/druid-indexing is a common choice.

  • Place your Hadoop configuration XMLs (core-site.xml, hdfs-site.xml, yarn-site.xml, mapred-site.xml) on the classpath of your Druid nodes. You can do this by copying them into conf/druid/_common/.

  • Ensure that you have configured a distributed deep storage. Note that while you do need a distributed deep storage in order to load data with Hadoop, it doesn’t need to be HDFS. For example, if your cluster is running on Amazon Web Services, we recommend using S3 for deep storage even if you are loading data using Hadoop or Elastic MapReduce.

  • Hadoop-based Druid ingestion task specs use a different format from built-in ingestion task specs. For an example, see the Tutorial: Load from Hadoop.