Hi - we currently have a hadoop (hdfs/mapreduce) cluster on version 2.0.0-cdh4.4.0. Druid out-of-the-box using hadoop 2.3.0 doesn’t seem to work with our HDFS: I tried getting Overlord to run a simple batch index task, and it did create the segment, but when trying to send the segment to HDFS deep storage I see this error in the task logs:
com.google.protobuf.InvalidProtocolBufferException: Protocol message contained an invalid tag (zero).
Based on a Google search, this seems to indicate a version incompatibility. hadoop-client 2.3.0 uses protobuf 2.5.0, but hadoop 2.0.0-cdh4.4.0 uses protobuf 2.4.0a.
So I tried compiling Druid using hadoop-client 2.0.0-cdh4.4.0 but compilation fails with:
[ERROR] /Users/zcox/code/druid/indexing-hadoop/src/main/java/io/druid/indexer/DetermineHashedPartitionsJob.java:[49,45] cannot find symbol
[ERROR] symbol: class CombineTextInputFormat
Indeed, org.apache.hadoop.mapreduce.lib.input.CombineTextInputFormat does not appear to exist in 2.0.0-cdh4.4.0.
Is 2.0.0-cdh4.4.0 just too old of a hadoop version to use with Druid? I saw several other threads in the mailing list where versions very close to this were being used, but those were from 2014.
Just wanted to do a sanity check before stating to the rest of my team that we can’t use Druid with our existing Hadoop cluster.