Using Mapr-Fs as local deep store in druid

I am trying to setup druid cluster using mapr-fs as my local deep storage, for this i used mapr-loopbacknfs service to create nfs mount on each server, all the services were up and running, but when i try to ingest data using

bin/post-index-task --url --file retail.json

  "ioConfig": {
      "type": "index",
      "firehose": {
        "type": "local",
        "baseDir": "/opt/imply/sample-data",
        "filter": "retail*"

If I choose baseDir as local path like /opt/imply/sample-data I am getting below exception.

2016-08-12T11:20:33,260 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_retail_2016-08-12T11:20:28.522Z] status changed to [RUNNING].
2016-08-12T11:20:33,261 INFO [task-runner-0-priority-0] io.druid.indexing.common.actions.RemoteTaskActionClient - Performing action for task[index_retail_2016-08-12T11:20:28.522Z]: LockListAction{}
2016-08-12T11:20:33,269 INFO [task-runner-0-priority-0] io.druid.indexing.common.actions.RemoteTaskActionClient - Submitting action for task[index_retail_2016-08-12T11:20:28.522Z] to overlord[]: LockListAction{}
2016-08-12T11:20:33,276 INFO [main] org.eclipse.jetty.server.Server - jetty-9.2.5.v20141112
2016-08-12T11:20:33,333 INFO [task-runner-0-priority-0] io.druid.segment.realtime.firehose.LocalFirehoseFactory - Searching for all [retail*] in and beneath [/opt/imply/sample-data]
2016-08-12T11:20:33,345 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[IndexTask{id=index_retail_2016-08-12T11:20:28.522Z, type=index, dataSource=retail}]
java.lang.IllegalArgumentException: Parameter 'directory' is not a directory
  at ~[commons-io-2.4.jar:2.4]
  at ~[commons-io-2.4.jar:2.4]
  at io.druid.segment.realtime.firehose.LocalFirehoseFactory.connect( ~[druid-server-]
  at io.druid.segment.realtime.firehose.LocalFirehoseFactory.connect( ~[druid-server-]
  at io.druid.indexing.common.task.IndexTask.getDataIntervals( ~[druid-indexing-service-]
  at ~[druid-indexing-service-]
  at io.druid.indexing.overlord.ThreadPoolTaskRunner$ [druid-indexing-service-]
  at io.druid.indexing.overlord.ThreadPoolTaskRunner$ [druid-indexing-service-]
  at [?:1.8.0_91]
  at java.util.concurrent.ThreadPoolExecutor.runWorker( [?:1.8.0_91]
  at java.util.concurrent.ThreadPoolExecutor$ [?:1.8.0_91]
  at [?:1.8.0_91]

First does this dir exists /opt/imply/sample-data ? try /opt/imply/sample-data/ instead ?

Second are you planing to index all your data on one single machine ? this might work for dev cycle, but for the production you would need to submit such task to a Hadoop cluster and use MapReduce Batch task, but i am pretty sure that will not work as well since MapR cluster uses a proprietary file system rather than HDFS. So the best way to go is to implement druid interfaces to talk MapR file system.

Good luck !

Hi Charan, can you try loading static files following this tutorial?

In general I think it will be easier than debugging the local firehose.

Hi Charan, my apologies, I misread and realized you’ve already done the local quickstart. It seems like the ingestion found the file, but isn’t able to actually read it. I’ll dig a bit more into this but for starters look into making sure you have the correct permissioning to access the file.


I think there is an issue with configuration or network with mapr cluster. I tried below steps on another cluster and it worked fine,

Install mapr-loopbacknfs client on nodes:

yum install mapr-loopbacknfs

cp /opt/mapr/conf/nfsserver.conf /usr/local/mapr-loopbacknfs/conf/

cp /opt/mapr/conf/mapr-clusters.conf /usr/local/mapr-loopbacknfs/conf/

service mapr-loopbacknfs start

mkdir /mapr

mount localhost:/mapr /mapr

df -P

Property changes at druid end:,,;create=true

But one doubt, using local storage, will it give power of map-reduce while ingesting data (assume we are deploying multiple middle managers) since it is not going via hadoop route?