Batch ingestion "granularity" inputSpec

Hi guys,

I’m trying to configure the ingestion to take the files from hdfs from the folders of form y=xxxx/m=xx/d=xx

I have such a piece of config:

“ioConfig”: {
“type”: “hadoop”,
“inputSpec”: {
“type”: “granularity”,
“dataGranularity” : “DAY”,
“inputPath” : “hdfs:///data/output”,
“filePattern” : “\*\.gz”
}
},

and the files in hdfs at /data/output/y=2015/m=11/d=01/part-xxxxx.gz

hadoop@i-88de7e31:~$ hadoop fs -ls /data/output/y=2015/m=11/d=01
Found 21 items
-rw-r–r-- 3 hadoop supergroup 244570257 2015-12-18 10:09 /data/output/y=2015/m=11/d=01/part-00000.gz
-rw-r–r-- 3 hadoop supergroup 199030241 2015-12-18 10:09 /data/output/y=2015/m=11/d=01/part-00001.gz

However in the logs i get:

[INFO ] 2015-12-18 10:16:32.535 [task-runner-0] GranularityPathSpec - Checking path[hdfs:/data/output/y=2015/m=11/d=01]
[INFO ] 2015-12-18 10:16:32.826 [task-runner-0] GranularityPathSpec - Checking path[hdfs:/data/output/y=2015/m=11/d=01]

Caused by: java.lang.RuntimeException: java.io.IOException: No input paths specified in job

Caused by: java.io.IOException: No input paths specified in job

Regex issues? Did you mean: .*\.gz ?

Yes, that worked. By bad. :slight_smile:

Thanks.