S3 Deep storage set up-errors

Hi All,

I am getting some errors while trying to ingest the sample wiki data.The job is trying to read data from S3.
We have verified the access of the user to read from and write to the s3 bucket.

wikiticker-index.json (2.18 KB)

Can you try adding the following to your existing jobProperties?

"fs.s3n.impl" : "org.apache.hadoop.fs.s3native.NativeS3FileSystem",

Reference docs: https://imply.io/docs/latest/ingestion-batch

Hi Sunil,

Did you figure this out? We are having similar issues. Anything you can share would be nice.

Thanks,

For anyone who cares, you need this in the lib folder https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws/2.6.0 for this to work.

Hi Pritesh,

We resolved it by making use of regular indexing task instead of hadoop indexing task.
Also we wanted to make use of the IAM roles assigned to EC2 instances and not access keys and secret keys.

Regards
Sunil

Hi Sunil,

We are new to druid and face the similar issue while writing the segmented files to S3

Please let us know how is IAM role used in the properties but not the access and secret keys. Also can you attach your JSON where you have used the regular indexing instead of hadoop indexing

Thanks,

Ravali

Hi Ravali,

The access keys and secret keys were omitted from the common.runtime.properties and we included only the following ones.

For S3:

druid.storage.type=s3
druid.storage.bucket=abcd
druid.storage.baseKey=/druid/segments

For S3:

druid.indexer.logs.type=s3
druid.indexer.logs.s3Bucket=abcd
druid.indexer.logs.s3Prefix=druid/indexing-logs

Also we placed jets3t.properties in the druid class path (under conf/druid/_common folder).

s3service.https-only=true
s3service.s3-endpoint=s3.amazonaws.com
s3service.s3-endpoint-https-port=443
s3service.server-side-encryption=AES256

hope this helps !!

Sunil