cause:java.io.IOException: No input paths specified in job

Team,

Trying to test Hadoop-ingestion task to push our historical data. Created xxx-druid bucket in S3 and uploaded few csv files under y=2016/m=02/d=29 directory and trying to run Hadoop-ingestion task for batch ingestion with “inputSpec” being granularity.The task fails with “No input paths specified in job”.

Any suggestions/help appreciated.

In task log:

2016-05-19T20:19:52,361 INFO [task-runner-0-priority-0] io.druid.indexer.path.GranularityPathSpec - Checking path[s3n://rms-druid/y=2016/m=02/d=27]
2016-05-19T20:19:52,403 INFO [task-runner-0-priority-0] io.druid.indexer.path.GranularityPathSpec - Checking path[s3n://rms-druid/y=2016/m=02/d=28]
2016-05-19T20:19:52,444 INFO [task-runner-0-priority-0] io.druid.indexer.path.GranularityPathSpec - Checking path[s3n://rms-druid/y=2016/m=02/d=29]
2016-05-19T20:19:52,541 INFO [task-runner-0-priority-0] org.apache.hadoop.conf.Configuration.deprecation - session.id is deprecated. Instead, use dfs.metrics.session-id
2016-05-19T20:19:52,542 INFO [task-runner-0-priority-0] org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId=
2016-05-19T20:19:52,566 WARN [task-runner-0-priority-0] org.apache.hadoop.mapreduce.JobSubmitter - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2016-05-19T20:19:52,569 WARN [task-runner-0-priority-0] org.apache.hadoop.mapreduce.JobSubmitter - No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
2016-05-19T20:19:52,574 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.JobSubmitter - Cleaning up the staging area file:/home/ubuntu/imply/imply-1.2.0/var/hadoop-tmp/mapred/staging/ubuntu104415565/.staging/job_local104415565_0001
2016-05-19T20:19:52,574 WARN [task-runner-0-priority-0] org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:ubuntu (auth:SIMPLE) cause:java.io.IOException: No input paths specified in job
2016-05-19T20:19:52,576 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_rac_2016-05-19T20:19:44.370Z, type=index_hadoop, dataSource=rac}]
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
	at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:160) ~[druid-indexing-service-0.9.0.jar:0.9.0]
	at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:175) ~[druid-indexing-service-0.9.0.jar:0.9.0]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:338) [druid-indexing-service-0.9.0.jar:0.9.0]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:318) [druid-indexing-service-0.9.0.jar:0.9.0]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_66]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_66]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_66]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_66]
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_66]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_66]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_66]
	at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_66]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:157) ~[druid-indexing-service-0.9.0.jar:0.9.0]

Config file:

{

"type": "index_hadoop",

"spec": {

    "ioConfig": {

        "type": "hadoop",

        "inputSpec": {

            "type": "granularity",

            "dataGranularity" : "DAY",

    	"inputPath" : "s3n://xxx-druid/",

    	"filePattern" : "\\*.csv",

    	"pathFormat" : "'y'=yyyy/'m'=MM/'d'=dd"

        }

    },

    "dataSchema": {

        "dataSource": "rac",

        "granularitySpec": {

            "type": "uniform",

            "segmentGranularity": "DAY",

            "queryGranularity": "NONE",

            "intervals": [

                "2016-02-20/2016-03-01"

            ]

        },

        "parser": {

            "type": "string",

            "parseSpec": {

                "format": "csv",

                "columns": [

                   ...

                ],

                "dimensionsSpec": {

                    "dimensions": [

                      ...

                    ],

                    "dimensionExclusions": [],

                    "spatialDimensions": []

                },

                "timestampSpec": {

                    "format": "auto",

                    "column": "timestamp"

                }

            }

        },

        "metricsSpec": [

           ...

        ]

    },

    "tuningConfig": {

        "type": "hadoop",

        "partitionsSpec": {

            "type": "hashed",

            "targetPartitionSize": 5000000

        },

        "jobProperties" : {

	"fs.s3.awsAccessKeyId" : "YOUR_ACCESS_KEY",

	"fs.s3.awsSecretAccessKey" : "YOUR_SECRET_KEY",

	"fs.s3.impl" : "org.apache.hadoop.fs.s3native.NativeS3FileSystem",

	"fs.s3n.awsAccessKeyId" : "XXXXXXXX",

	"fs.s3n.awsSecretAccessKey" : "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",

	"fs.s3n.impl" : "org.apache.hadoop.fs.s3native.NativeS3FileSystem",

	"io.compression.codecs" :  "org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec"

}

    }

}

}

Hi Rahul, this task seems to be missing most of the stack trace. Can you include the rest of it?

Hi Fangjin,

The issue was in file pattern. Changed from “\.csv" to ".” and was able to successfully run Hadoop-ingestion task.

It seems that Druid throws this error when there are no files matched for indexing … is there a way to change this behavior? For example, having druid create empty segment?