Hadoop HA support

Hi!

Does Druid currently support Hadoop HA (http://hadoop.apache.org/docs/r2.5.2/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html) ?

If it does, how to configure?

I’m using Druid 0.7.0 :slight_smile:

Thanks.

And support here means using HDFS as deep storage and Hadoop for indexing jobs.

Yes, Druid works fine with both HA HDFS as well as HA YARN for running indexing jobs.
–T

Thanks TJ!
Do you mind give a tip on how to configure?

Currently without HA I do it by

druid.indexer.fork.property.druid.storage.storageDirectory=hdfs://master.druid.hadoop:9000/druid/segment_data

in middleManager’s runtime.properties. Not very familiar with Hadoop sorry.

After checking the source code, I figure out myself, adding configs like below to java start argument should work:

-Dhadoop.fs.defaultFS=hdfs://mycluster \

-Dhadoop.dfs.nameservices=mycluster \

-Dhadoop.dfs.ha.namenodes.mycluster=nn1,nn2 \

-Dhadoop.dfs.namenode.rpc-address.mycluster.nn1=hadoop.namenode1:9000 \

-Dhadoop.dfs.namenode.rpc-address.mycluster.nn2=hadoop.namenode2:9000 \

-Dhadoop.dfs.client.failover.proxy.provider.mycluster=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

Node that those setting should be the same as the configs in your Hadoop’s core-site.xml and hdfs-site.xml. And if you are using HDFS as deep storage and store task log, may face some problems, this PR should help: https://github.com/druid-io/druid/pull/1379