Advice on Druid cluster configuration on mapr5

We are using mapr5 as our production cluster. Mapr5 implements their own filesystem, which is able to run map-reduce but, of course, is not hdfs.
I tried using hdfs-storage extension to make druid talk to that cluster, and, as I expected, it did not work.
Mapr, however, implements NFS bridge, which allows systems running on it to use cluster ‘locally’.
So I was wondering, how one would run druid in this situation. Does it mean hadoop-based batch ingestion is not available and I can only do index one?
Are there any workarounds?

Hi Pavel,

Probably the best way is to implement druid interfaces to make druid talk MapR file system.

Otherwise i am not familiar with the NFS bridge, but if my understanding is correct the bridge let you access the MapR cluster as a Posix network file system and that won’t make batch ingestion working.

Hi Pavel, what errors did you see with the hdfs-storage extension? I wonder if it is a dependency conflict problem.

I think the best bet in this situation is actually to implement a custom deep storage extension for the map R filesystem. There are many examples of how to impl different types of deep storages. Hadoop-based batch ingestion should still be available regardless, and folks out there do run with NFS as their deep storage.

We are also attempting to use Druid with MapR-FS as our filesystem. When we attempt to load druid fails in DataSegment serializeOutIndex, where the case statement uses getScheme and MapR returns “maprfs”, which throws “Unknown file system scheme [maprfs]”.

com.metamx.common.IAE: Unknown file system scheme
[maprfs]

    at

io.druid.indexer.JobHelper.serializeOutIndex(JobHelper.java:417)

    at

io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:703)

    at

io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:469)

    at

org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)

    at

org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:620)

    at

org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:458)

    at

org.apache.hadoop.mapred.Child$4.run(Child.java:278)

    at

java.security.AccessController.doPrivileged(Native Method)

    at

javax.security.auth.Subject.doAs(Subject.java:415)

    at

org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)

    at

org.apache.hadoop.mapred.Child.main(Child.java:267)

Since MapR-FS is supposed to be API compatible with hdfs, we were going to just try adding “maprfs” to the case statement and rebuilding druid, but I suspect we will have additional dependency issues.

As the OP suggested, configuring druid to use local and using NFS might also work, but does not seem to be a long term production capable solution.

I think I will try to implement a MapR deep storage extension. I like this product and I really want it to work with our fs. Kafka is, of course a good alternative but we do have a lot of historical data nonetheless. Do I understand correctly that on high-level, I basically would have to kick-off a map-reduce job on a cluster to index data and then load it as segments into druid based on the provided spec file?

Hi Pavel,

Great i guess the only way to make batch ingestion working is to write a proper deep storage module for MapR file system. Druid has 2 good examples that you can follow. The first is HDFS and the second is S3.

And this will enable you to use mapreduce based job to create the indexes the druid historicals will load it to the RAM and serve out of it.

Hope that helps.

Druid batch ingestion works by kicking off a job to Hadoop/Spark/something else that can create Druid segments. Once that job finishes though, the loading of segments into Druid should be automated.

Hi Pavel,

Did you manage to get Druid to work on MapR?

I posted a response to a similar thread in druid-development about using JAR shading to get Druid running with MapR:

https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/druid-development/c6RdX2z2vcA/KhAxHwppBAAJ