java.lang.IllegalArgumentException: Parameter 'directory' is not a directory:

Hi All,

We are trying to load TSV file to druid cluster setup with Ambari. Before doing so we did the same on a single node in the local machine and everything worked fine. On Ambari cluster is gives the error as mentioned below

2018-10-14T12:22:12,321 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[IndexTask{id=index_etlranker_2018-10-14T12:22:08.103Z, type=index, dataSource=etlranker}]
java.lang.IllegalArgumentException: Parameter ‘directory’ is not a directory: /home/druid/DFP_SESReport/2018/04/02
at org.apache.commons.io.FileUtils.validateListFilesParameters(FileUtils.java:536) ~[commons-io-2.5.jar:2.5]
at org.apache.commons.io.FileUtils.listFiles(FileUtils.java:512) ~[commons-io-2.5.jar:2.5]
at io.druid.segment.realtime.firehose.LocalFirehoseFactory.initObjects(LocalFirehoseFactory.java:82) ~[druid-server-0.10.1.2.6.5.0-292.jar:0.10.1.2.6.5.0-292]
at io.druid.data.input.impl.AbstractTextFilesFirehoseFactory.connect(AbstractTextFilesFirehoseFactory.java:57) ~[druid-api-0.10.1.2.6.5.0-292.jar:0.10.1.2.6.5.0-292]
at io.druid.data.input.impl.AbstractTextFilesFirehoseFactory.connect(AbstractTextFilesFirehoseFactory.java:46) ~[druid-api-0.10.1.2.6.5.0-292.jar:0.10.1.2.6.5.0-292]
at io.druid.indexing.common.task.IndexTask.determineShardSpecs(IndexTask.java:268) ~[druid-indexing-service-0.10.1.2.6.5.0-292.jar:0.10.1.2.6.5.0-292]
at io.druid.indexing.common.task.IndexTask.run(IndexTask.java:188) ~[druid-indexing-service-0.10.1.2.6.5.0-292.jar:0.10.1.2.6.5.0-292]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid-indexing-service-0.10.1.2.6.5.0-292.jar:0.10.1.2.6.5.0-292]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid-indexing-service-0.10.1.2.6.5.0-292.jar:0.10.1.2.6.5.0-292]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_112]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_112]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_112]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
2018-10-14T12:22:12,323 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_etlranker_2018-10-14T12:22:08.103Z] status changed to [FAILED].
2018-10-14T12:22:12,327 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
“id” : “index_etlranker_2018-10-14T12:22:08.103Z”,
“status” : “FAILED”,
“duration” : 37
}

attached as the json file and the full error logs. Have spend hours to fix this with no luck.

Any help will be appreciated.

Regards,

Chethan G Puttaswamy


[error_log.txt|attachment](upload://iqDD4aMppWVWdlxflHy6vFqKTP2.txt) (67.4 KB)



[import.json|attachment](upload://oWDl0oddHTiTN5YvKVxPQ6U16xn.json) (2.71 KB)

Looking over the error message and the docs for the local firehose, my best guess is that it has worked locally because the input files were available at: /home/druid/DFP_SESReport/2018/04/02

The docs at: http://druid.io/docs/latest/ingestion/firehose.html are not clear on what “This Firehose can be used to read the data from files on local disk.” actually means in a clustered environment.

Without further input on where the files should be on a multi-server set up, you could try ensuring that the files are available at: /home/druid/DFP_SESReport/2018/04/02 on each server.

Note that the docs also state that “local” mode “…can be used for POCs to ingest data on disk” leading me to think it may not be suitable for a clustered/production environment.

Dyana

This I fixed. Please place data files in datanode machine and give directory as below. Please make sure “baseDir”: “/home/druidadmin/druiddata/” exists in datanode(s)

“ioConfig”: {

“type”: “index”,

“firehose”: {

“type”: “local”,

“baseDir”: “/home/druidadmin/druiddata/”,

“filter”: “*.csv”

},

Thanks,

-Madhu