Batch file ingestion from s3 to druid in my local machine

Hi All ,

I actually want to test druid’s compatibility with s3 .

So I have json file sitting in my s3 folder .But facing issue while trying to ingest that.

**steps I have followed are below: **

  • included the druid_s3_extensions in my loadlist in the file common.runtime.properties
  • I have enabled s3 configurations for deep storage and indexing logs and disabled local and HDFS configurations .And below is my configurations in the file common.runtime.properties

Deep storage

For S3:

druid.storage.type=s3

druid.storage.bucket=mybucketname

druid.storage.baseKey=folder path where the file will be present

druid.s3.accessKey=my Accesskey

druid.s3.secretKey=*my secret key

druid.storage.sse.type=s3

Indexing service logs

For S3:

druid.indexer.logs.type=s3

druid.indexer.logs.s3Bucket= mylogbucketname

druid.indexer.logs.s3Prefix=folder path where the log files should be there

  • And my indexing-s3-task.json file looks like below :

{

“type” : “index_hadoop”,

“spec” : {

“ioConfig” : {

“type” : “hadoop”,

“inputspec” : {

“type” : “static”,

“paths” : “s3n://accesskey:secretkey@bucketname/directory/wikiticker-2015-09-12-sampled.json”

}

},

“dataSchema” : {

“dataSource” : “wikipedias3”,

“parser” : {

“type” : “hadoopyString”,

“parseSpec” : {

“format” : “json”,

“dimensionsSpec” : {

“dimensions” : [

“channel”,

“cityName”,

{ “name”: “added”, “type”: “long” },

{ “name”: “deleted”, “type”: “long” },

{ “name”: “delta”, “type”: “long” }

]

},

“timestampSpec” : {

“format” : “auto”,

“column” : “time”

}

}

},

“metricsSpec” : ,

“granularitySpec” : {

“type” : “uniform”,

“segmentGranularity” : “day”,

“queryGranularity” : “none”,

“intervals” : [“2015-09-12/2015-09-13”],

“rollup” : false

}

},

“tuningConfig” : {

“type” : “hadoop”,

“partitionsSpec” : {

“type” : “hashed”,

“targetPartitionSize” : 5000000

}

}

}

}

**The error I am receiving when I submit my “indexing-s3-task.json” task is **

 ** Error injecting constructor, java.lang.IllegalArgumentException: Can not create a Path from an empty string**

**Also I have attached the screenshot of part of  the index task log, Anyone any thought ?**



Thanks,
Anoosha

Hi Users ,

Can anyone help me with this issue ?

Thanks,

Anoosha

You need to remove pdfs-storage-extension from your loadList and restart your druid cluster.

Rommel Garcia

1 Like