AWS new regions S3 compatibility issues

Hi,

We are facing issues with Druid when we use S3 as deep storage and input path for data in newer regions of AWS, in this case Mumbai (ap-south-1). Let me explain in detail -

Druid version - 0.11.0

Deep Storage - S3

Input to read data from - S3

Job - Batch ingestion using Hadoop

Job config -

{

“type” : “index_hadoop”,

“spec” : {

"dataSchema" : {

  "dataSource" : "zai-v2-ind",

  "parser" : {

    "type" : "hadoopyString",

    "parseSpec" : {

      "format" : "json",

      "timestampSpec" : {

        "column" : "input_timestamp",

        "format" : "millis"

      },

      "dimensionsSpec" : {

        "dimensions" : [

          "device_brand",

          "device_devicemodel",

          "device_deviceos",

          "device_devicetype",

          ....<other attributes>

          ],

        "dimensionExclusions" : []

      }

    }

  },

  "metricsSpec" : [

    {

      "type" : "count",

      "name" : "count"

    },

    {

      "name": "unique_users",

      "type": "hyperUnique",

      "fieldName": "unique_user"

    }

  ],

    "type" : "uniform",

    "segmentGranularity" : "HOUR",

    "queryGranularity" : "MINUTE",

    "intervals" : [ "2018-02-01/2018-02-10" ]

  }

},

"ioConfig" : {

  "type" : "hadoop",

  "inputSpec" : {

    "type" : "static",

    "paths" : "s3://<input_path>"

  }

},

"tuningConfig" : {

  "type": "hadoop",

  "ignoreInvalidRows": "true",

  "partitionsSpec": {

       "type":"hashed",

       "targetPartitionSize": 5000000

  },

  "jobProperties" : {

    "mapreduce.job.user.classpath.first": "true",

    "mapreduce.map.java.opts":"-server -Xmx512m -Duser.timezone=UTC -Dfile.encoding=UTF-8",

    "mapreduce.reduce.java.opts":"-server -Xmx3072m -Duser.timezone=UTC -Dfile.encoding=UTF-8",

    "mapreduce.map.memory.mb" : 1024,

    "mapreduce.reduce.memory.mb" : 4096,

    "mapreduce.job.reduces" : 200,

    "mapreduce.map.speculative" : false,

    "mapreduce.reduce.speculative" : false,

    "fs.s3.awsAccessKeyId" : "<>",

    "fs.s3.awsSecretAccessKey" : "<>",

    "fs.s3.impl" : "org.apache.hadoop.fs.s3native.NativeS3FileSystem",

    "fs.s3n.awsAccessKeyId" : "<>",

    "fs.s3n.awsSecretAccessKey" : "<>",

    "fs.s3n.impl" : "org.apache.hadoop.fs.s3native.NativeS3FileSystem",

    "io.compression.codecs" : "org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec"

  }

}

},

“hadoopDependencyCoordinates”: [“org.apache.hadoop:hadoop-client:2.7.3”, “org.apache.hadoop:hadoop-aws:2.7.3”]

}

The deep storage is configured as S3 in same region where input path is. When deployed in Ireland region, the job runs smoothly without any issues. But when run in India region, following issues are faced -

  1. Druid not able to read files from S3.

Caused by: java.io.IOException: s3n://<bucket_name> : 400 : Bad Request

at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:453) ~[?:?]

at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:427) ~[?:?]

at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleException(Jets3tNativeFileSystemStore.java:411) ~[?:?]

at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:181) ~[?:?]

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_161]

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_161]

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_161]

at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_161]

at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) ~[?:?]

at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) ~[?:?]

at org.apache.hadoop.fs.s3native.$Proxy211.retrieveMetadata(Unknown Source) ~[?:?]

at org.apache.hadoop.fs.s3native.NativeS3FileSystem.listStatus(NativeS3FileSystem.java:530) ~[?:?]

at org.apache.hadoop.fs.Globber.listStatus(Globber.java:69) ~[?:?]

at org.apache.hadoop.fs.Globber.glob(Globber.java:217) ~[?:?]

at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1676) ~[?:?]

at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:294) ~[?:?]

at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265) ~[?:?]

at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387) ~[?:?]

at org.apache.hadoop.mapreduce.lib.input.DelegatingInputFormat.getSplits(DelegatingInputFormat.java:115) ~[?:?]

at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301) ~[?:?]

at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318) ~[?:?]

at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196) ~[?:?]

  1. After configuring for version 4 signing, https://groups.google.com/forum/#!topic/druid-user/i3qK0u5BDGM . Another exception is seen -

Caused by: java.lang.reflect.InvocationTargetException

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_161]

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_161]

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_161]

at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_161]

at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:215) ~[druid-indexing-service-0.11.0.jar:0.11.0]

… 7 more

Caused by: java.lang.NoSuchMethodError: com.amazonaws.AmazonWebServiceRequest.copyPrivateRequestParameters()Ljava/util/Map;

at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3506) ~[?:?]

at com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031) ~[?:?]

at com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994) ~[?:?]

at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:297) ~[?:?]

at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669) ~[?:?]

at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) ~[?:?]

at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) ~[?:?]

at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) ~[?:?]

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) ~[?:?]

at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) ~[?:?]

at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:500) ~[?:?]

at org.apache.hadoop.mapreduce.lib.input.DelegatingInputFormat.getSplits(DelegatingInputFormat.java:110) ~[?:?]

at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301) ~[?:?]

My jets3t.properties -

s3service.s3-endpoint=s3.ap-south-1.amazonaws.com
storage-service.request-signature-version=AWS4-HMAC-SHA256
uploads.stream-retry-buffer-size=2147483646

I have also observed that Druid is able to write data to buckets in India. (Both indexing logs and segments) but not able to read from them.

I have reached deadlock at this point. So, any direction or solution is welcome.

Thanks,

Chaitanya

Hi Chaitanya,

would you try adding ‘“mapreduce.job.classloader”: “true”’ to your jobProperties?

Jihoon

2018년 3월 5일 (월) 오후 11:00, chaitanya.bendre@zeotap.com님이 작성: