Using s3 as deep storage, makes the datastore not fully available

Hi,

Using the docs, i have installed druid 0.11.0 in single machine and configured it to use deep storage as S3 in common.runtime.properties file and left the remaining configurations as it is.

I ran an ingestion task with

datasource name as test-datasource

S3 bucket name is druid-historical-data

these are some of the lines from the log

Retrying request with “AWS4-HMAC-SHA256” signing mechanism: PUT https://druid-historical-data.s3.amazonaws.com:443/druid/segments/test-datasource/2018-01-18T00%3A00%3A00.000Z_2018-01-19T00%3A00%3A00.000Z/2018-02-12T08%3A05%3A07.022Z/0/index.zip HTTP/1.1

2018-02-12T08:05:16,596 WARN [appenderator_merge_0] org.jets3t.service.impl.rest.httpclient.RestStorageService - Retrying request following error response: PUT '/druid/segments/test-datasource/2018-01-18T00:00:00.000Z_2018-01-19T00:00:00.000Z/2018-02-12T08:05:07.022Z/0/index.zip' -- ResponseCode: 400, ResponseStatus: Bad Request, Request Headers: [Date: Mon, 12 Feb 2018 08:05:16 GMT, x-amz-content-sha256: c8a74ce16ee9ea2a98b806b9c49328e0a1d87d746b554eff1f985dc94d2435fb, x-amz-acl: bucket-owner-full-control, x-amz-meta-md5-hash: e90d12e91c05fb7e616f90ad4ae7691f, Content-Type: application/zip, Content-MD5: 6Q0S6RwF+35hb5CtSudpHw==, Authorization: AWS AKIAIT7UT2L25ZBBXF5Q:94O8ZoE1sonVVKmeZQ7FSTJXzu8=], Response Headers: [x-amz-request-id: 0FAC34C36D591AEB, x-amz-id-2: 6oG4sDKt2wvLdeWrvh+i8seDxVnfGEIIUXfvZXqupyrDUO0uPgGHJNLqrLtdA/u6Q8GLF0VZ97M=, x-amz-region: ap-south-1, Content-Type: application/xml, Transfer-Encoding: chunked, Date: Mon, 12 Feb 2018 08:05:18 GMT, Connection: close, Server: AmazonS3]
2018-02-12T08:05:16,668 WARN [appenderator_merge_0] org.jets3t.service.impl.rest.httpclient.RestStorageService - Retrying request after automatic adjustment of Host endpoint from "druid-historical-data.s3.amazonaws.com" to "druid-historical-data.s3-ap-south-1.amazonaws.com" following request signing error using AWS request signing version 4: PUT https://druid-historical-data.s3-ap-south-1.amazonaws.com:443/druid/segments/test-datasource/2018-01-18T00%3A00%3A00.000Z_2018-01-19T00%3A00%3A00.000Z/2018-02-12T08%3A05%3A07.022Z/0/index.zip HTTP/1.1
2018-02-12T08:05:16,668 WARN [appenderator_merge_0] org.jets3t.service.impl.rest.httpclient.RestStorageService - Retrying request following error response: PUT '/druid/segments/test-datasource/2018-01-18T00:00:00.000Z_2018-01-19T00:00:00.000Z/2018-02-12T08:05:07.022Z/0/index.zip' -- ResponseCode: 400, ResponseStatus: Bad Request, Request Headers: [Date: Mon, 12 Feb 2018 08:05:16 GMT, x-amz-content-sha256: c8a74ce16ee9ea2a98b806b9c49328e0a1d87d746b554eff1f985dc94d2435fb, x-amz-acl: bucket-owner-full-control, x-amz-meta-md5-hash: e90d12e91c05fb7e616f90ad4ae7691f, Content-Type: application/zip, Content-MD5: 6Q0S6RwF+35hb5CtSudpHw==, Host: druid-historical-data.s3.amazonaws.com, x-amz-date: 20180212T080516Z, Authorization: AWS4-HMAC-SHA256 Credential=AKIAIT7UT2L25ZBBXF5Q/20180212/us-east-1/s3/aws4_request,SignedHeaders=content-md5;content-type;date;host;x-amz-acl;x-amz-content-sha256;x-amz-date;x-amz-meta-md5-hash,Signature=37d9f4299766bb2f44c8a3cb65c6e7df5daf6e62c6909f3a5480fdc23fc336ad], Response Headers: [x-amz-request-id: C6381E720051AEE3, x-amz-id-2: YSIYO/PAZ6A7yJSjCJW0QqffwrA/24lEpcwD8bsiuI2IBB5kIp1iaK4bO0bOZSQajbxzp/RFPuI=, Content-Type: application/xml, Transfer-Encoding: chunked, Date: Mon, 12 Feb 2018 08:05:17 GMT, Connection: close, Server: AmazonS3]

if i understood them right, segments should not go to the S3 bucket right? But, i can clearly see the files descriptor.json and index.zip files in that test-datasource folder.

Also in the co-ordinator console, test-datasource is forever giving me the red.

I have also included the whole ingestion log file.

Is there something i have missed, please help me out.

–THANKS

ingestion.log (152 KB)

Can anyone look into this? I can post additional information if needed.

THANKS–

Seems like a S3 bucket’s zone is the issue,
earlier, i created the bucket in it Asia Pacific Mumbai zone and there is the problem.
but, when I created a bucket in US West North California, everything works fine.

So, what can i do to make druid work perfectly with the buckets in the Mumbai zone. Yes, to reduce latency.

–THANKS

Hi Sunil,

Currently Druid uses jets3t for communication with S3 (although we will probably change to aws-java-sdk soon due to compat issues like this with jets3t).

In the meantime try checking https://jets3t.s3.amazonaws.com/toolkit/configuration.html for options you can specify in jets3t.properties in your classpath (i.e. conf/druid/_common/jets3t.properties). I think setting s3service.s3-endpoint to your endpoint might help. They are listed on https://docs.aws.amazon.com/general/latest/gr/rande.html.

I have tried changing endpoint but got hit with another issue with authentication mechanism. Mumbai zone uses some mechanism called S4 which jets3t doesn’t support. As you have mentioned that you have already thought of replacing jets3t, I presume that there is ongoing development?

Hi Sunil,

Yes- the pull request is at https://github.com/druid-io/druid/pull/5382.