Druid 0.9.0: pull-deps pulls the wrong version of Hadoop dependencies for druid-hdfs-storage

Hi, I think I have a problem related to https://github.com/druid-io/druid/issues/2476. I’m running pull-deps with this:

java -cp “lib/*” io.druid.cli.Main tools pull-deps -h org.apache.hadoop:hadoop-client:2.6.0 -c io.druid.extensions:druid-hdfs-storage:0.9.0 --clean

I correctly get the 2.6.0 set of jars in the hadoop-dependencies directory, but it also pulls a bunch of Hadoop 2.3.0 jars into extensions/druid-hdfs-storage. This causes my Hadoop re-ingestion jobs to fail. However, if I replace the 2.3.0 Hadoop jars with 2.6.0 jars, everything works fine (though I must set hadoop.mapreduce.job.user.classpath.first=true in the job config to avoid Jackson problems).

Does anyone have a more elegant work-around for this? Thanks!

–T

Hey TJ,

Sorry for the delay.

The hadoop version you pass to pull-deps only affects what is used for the purposes of running MR jobs. The HDFS storage extension always uses 2.3.0 since that’s what that extension is compiled against. I think if I were in your position, recompiling that extension or replacing the jars by hand are the best workarounds.

Btw, could I ask you to please post on https://github.com/druid-io/druid/issues/2476 with more detail about what exception you get, where, when running with that config you mentioned? We’re working on a Hadoop testing tool and it would be good to make sure we’re catching the case you’re running into.