Static-s3 firehose always giving me access denied

Hi,

I’m struggling for 2 days now to ingest data that are stored in an encrypted s3 bucket, without involving hadoop. But now, without success :frowning:

To avoid some mistakes, all my work is based on one of the tutorial example : wikipedia-index.json

I’m using Druid 0.14.0-incubating

In order to achieve successful ingestion, here is what i’ve already done.

1/ I took the “wikiticker-2015-09-12-sampled.json.gz” example and put it in my bucket (name it “feed_test”)

2/ Then I took the wikipedia-index.json, modify the firehose this way :

“ioConfig” : {
“type” : “index”,
“firehose” : {
“type” : “static-s3”,
“uris” : [“s3://feed_test/wikiticker-2015-09-12-sampled.json.gz”]
},
“appendToExisting” : false
}

and tried to curl it to my overlord.

==> My task ended up with the following error :

2019-05-09T09:13:49,589 WARN [firehose_fetch_0] org.apache.druid.java.util.common.RetryUtils - Failed to download object[s3://feed_test/wikiticker-2015-09-12-sampled.json.gz], retrying (2 of 3) in 1,862ms.
java.io.IOException: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: DA2987FFE8B695C5; S3 Extended Request ID: HOD/WTd/VncVuHXCJOFBNjVM7hLd5MnOhNE5OSTx6pfjIt4sAa/d/J2hVVzxEvaIvS+5GaSbiU0=), S3 Extended Request ID: HOD/WTd/VncVuHXCJOFBNjVM7hLd5MnOhNE5OSTx6pfjIt4sAa/d/J2hVVzxEvaIvS+5GaSbiU0=
at org.apache.druid.firehose.s3.StaticS3FirehoseFactory.openObjectStream(StaticS3FirehoseFactory.java:186) ~[?:?]
at org.apache.druid.firehose.s3.StaticS3FirehoseFactory.openObjectStream(StaticS3FirehoseFactory.java:60) ~[?:?]
at org.apache.druid.data.input.impl.prefetch.PrefetchableTextFilesFirehoseFactory.access$000(PrefetchableTextFilesFirehoseFactory.java:89) ~[druid-core-0.14.0-incubating.jar:0.14.0-incubating]
at org.apache.druid.data.input.impl.prefetch.PrefetchableTextFilesFirehoseFactory$1.open(PrefetchableTextFilesFirehoseFactory.java:194) ~[druid-core-0.14.0-incubating.jar:0.14.0-incubating]
at org.apache.druid.data.input.impl.prefetch.FileFetcher.lambda$download$0(FileFetcher.java:97) ~[druid-core-0.14.0-incubating.jar:0.14.0-incubating]
at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:86) ~[druid-core-0.14.0-incubating.jar:0.14.0-incubating]
at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:125) ~[druid-core-0.14.0-incubating.jar:0.14.0-incubating]
at org.apache.druid.data.input.impl.prefetch.FileFetcher.download(FileFetcher.java:95) ~[druid-core-0.14.0-incubating.jar:0.14.0-incubating]
at org.apache.druid.data.input.impl.prefetch.Fetcher.fetch(Fetcher.java:135) ~[druid-core-0.14.0-incubating.jar:0.14.0-incubating]
at org.apache.druid.data.input.impl.prefetch.Fetcher.lambda$fetchIfNeeded$0(Fetcher.java:111) ~[druid-core-0.14.0-incubating.jar:0.14.0-incubating]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_201]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_201]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_201]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201]
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: DA2987FFE8B695C5; S3 Extended Request ID: HOD/WTd/VncVuHXCJOFBNjVM7hLd5MnOhNE5OSTx6pfjIt4sAa/d/J2hVVzxEvaIvS+5GaSbiU0=)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1638) ~[aws-java-sdk-core-1.11.199.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1303) ~[aws-java-sdk-core-1.11.1
99.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1055) ~[aws-java-sdk-core-1.11.199.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743) ~[aws-java-sdk-core-1.11.199.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717) ~[aws-java-sdk-core-1.11.199.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699) ~[aws-java-sdk-core-1.11.199.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667) ~[aws-java-sdk-core-1.11.199.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649) ~[aws-java-sdk-core-1.11.199.jar:?]
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513) ~[aws-java-sdk-core-1.11.199.jar:?]
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4229) ~[aws-java-sdk-s3-1.11.199.jar:?]
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4176) ~[aws-java-sdk-s3-1.11.199.jar:?]
at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1380) ~[aws-java-sdk-s3-1.11.199.jar:?]
at org.apache.druid.storage.s3.ServerSideEncryptingAmazonS3.getObject(ServerSideEncryptingAmazonS3.java:89) ~[?:?]
at org.apache.druid.storage.s3.ServerSideEncryptingAmazonS3.getObject(ServerSideEncryptingAmazonS3.java:84) ~[?:?]
at org.apache.druid.firehose.s3.StaticS3FirehoseFactory.openObjectStream(StaticS3FirehoseFactory.java:179) ~[?:?]
… 13 more

As a right access issue is prettyt clearly indicated, I did 3 things :

  • As my deep storage is set on S3 (and that works perfectly), I added my IAM user all the rights to the feed_test s3 bucket.
    ===> access still denied

  • I also tried to complete the instance profile attached to my instances to provide them “feed_test” s3 access.
    ==> If I log in to one of my EC2 instance and execute “aws s3 cp s3://feed_test/wikiticker-2015-09-12-sampled.json.gz .”, this works perfectly.
    When I run my task : access still denied

  • I also ensured that my instance has privilege to decrypt kms key : it does.

None of this worked, and i’m running out of clues…

The way Druid is connecting to my s3 bucket is unclear to me for now.
Will it use the access key provided in common_runtimes.properties (the one used to access deep storage s3) ? Or should I provide something else in my json index ?

If someone has a few minutes to help me, it would be kind :slight_smile:

Regards

Damn !

So I got into the code and found the answer :

Druid use the access key and secret key you provide in runtime.properties.

And my user was missing kms:Decrypt permission to read from the bucket…

Nice find. I ended up giving all permissions to the role but I will try this out

Hi Karthik,

Actually, I ended up removing access/secret keys from my runtime properties and used the IAM profile policies attached to my instances.

I thought that was not really working well until I saw a comment in another thread pointing out that IAM role handling is the last method checked for authentication (so if it meets an access key in properties files, it doesn’t use the IAM role).

Now, it works fine, I spare an IAM user maintenance and my setup is cleaner :slight_smile:

Hi Guillaume,

sorry just saw your question. Glad that you figured out it.

Looks like worth to document this. Perhaps here: http://druid.io/docs/latest/development/extensions-core/s3.html.

Are you interested in contribution? You can raise a PR if you want. Please check https://druid.apache.org/community/#contributing for details.

Jihoon

Hi,

Yes, I think I’ll take a look at this and raise a PR :slight_smile:

Thank you!

Just for the tracking

PR : https://github.com/apache/incubator-druid/pull/7674