How to use "granularity" in InputSpec for Hadoop Batch Ingestion when the pathFormat is other than

`Hi ,

I am trying to perform Hadoop Batch Ingestion and my file format is “YYYY-MM-dd-HH” not the more common ```y=XXXX/m=XX/d=XX/H=XX/M=XX/S=XX.

`How to use “granularity” in InputSpec for Hadoop Batch Ingestion when the pathFormat is other than y=XXXX/m=XX/d=XX/H=XX/M=XX/S=XX.

I am using the below spec and facing “NoInputPath” error.

    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:237) ~[hadoop-mapreduce-client-core-2.4.0.jar:?]
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:375) ~[hadoop-mapreduce-client-core-2.4.0.jar:?]
    at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:493) ~[hadoop-mapreduce-client-core-2.4.0.jar:?]
    at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:510) ~[hadoop-mapreduce-client-core-2.4.0.jar:?]
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394) ~[hadoop-mapreduce-client-core-2.4.0.jar:?]
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) ~[hadoop-mapreduce-client-core-2.4.0.jar:?]  
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) ~[hadoop-mapreduce-client-core-2.4.0.jar:?]  
    at java.security.AccessController.doPrivilege

“granularitySpec” : {
“type” : “uniform”,
“segmentGranularity” : “HOUR”,
“queryGranularity” : “NONE”,
“intervals” : [ “2016-01-01T00/2016-01-01T02”]
}
},
“ioConfig” : {
“type” : “hadoop”,
“metadataUpdateSpec” : {
“type”:“mysql”,
“connectURI” : “jdbc:mysql://localhost:3306/druid”,
“password” : “druid”,
“segmentTable” : “druid_segments”,
“user” : “druid”
},
“segmentOutputPath” : “s3://xxxxxx/xxxx/xxxx/output”,
“inputSpec” : {
“type” : “granularity”,
“dataGranularity” : “hour”,
“inputPath” : “s3://xxxx/inputData/”,
“pathFormat” : “‘y’=yyyy-‘m’=MM-‘d’=dd-‘H’=HH”,
“filePattern” : “part*.gz”
}

`

I'm not 100% sure as I never tried out this variant myself, but I think that the individual date parts have to be in separate sub folders, first year then month and so on.

Check Granularity.java source file , it has the pattern path is expecting for a granularity. if you check each section say Hour/Day , you can get some idea .

Thanks,

I changed the spec file to the following wildcard character and it worked.

“segmentOutputPath” : “s3://anindit-druid-datalytics/data/index/output”,
“inputSpec” : {
“type” : “static”,
“paths” : “s3://anindit-druid-datalytics/inputData/2016-01--/part-r-0000*.gz”
}