Hadoop HA support


Does Druid currently support Hadoop HA (http://hadoop.apache.org/docs/r2.5.2/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html) ?

If it does, how to configure?

I’m using Druid 0.7.0 :slight_smile:


And support here means using HDFS as deep storage and Hadoop for indexing jobs.

Yes, Druid works fine with both HA HDFS as well as HA YARN for running indexing jobs.

Thanks TJ!
Do you mind give a tip on how to configure?

Currently without HA I do it by


in middleManager’s runtime.properties. Not very familiar with Hadoop sorry.

After checking the source code, I figure out myself, adding configs like below to java start argument should work:

-Dhadoop.fs.defaultFS=hdfs://mycluster \

-Dhadoop.dfs.nameservices=mycluster \

-Dhadoop.dfs.ha.namenodes.mycluster=nn1,nn2 \

-Dhadoop.dfs.namenode.rpc-address.mycluster.nn1=hadoop.namenode1:9000 \

-Dhadoop.dfs.namenode.rpc-address.mycluster.nn2=hadoop.namenode2:9000 \


Node that those setting should be the same as the configs in your Hadoop’s core-site.xml and hdfs-site.xml. And if you are using HDFS as deep storage and store task log, may face some problems, this PR should help: https://github.com/druid-io/druid/pull/1379