Ingetst local data to s3 deep storage failed

Hello,

I’m trying to load the sample data wikiticker to druid and get following exception. I have used s3 as deep storage.

2016-08-10T03:46:48,072 WARN [Thread-59] org.apache.hadoop.mapred.LocalJobRunner - job_local826592096_0002 java.lang.Exception: java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3n URL, or by setting the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey properties (respectively). at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) ~[hadoop-mapreduce-client-common-2.3.0.jar:?] at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) [hadoop-mapreduce-client-common-2.3.0.jar:?] Caused by: java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3n URL, or by setting the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey properties (respectively). at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:70) ~[hadoop-common-2.3.0.jar:?] at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:61) ~[hadoop-common-2.3.0.jar:?] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.7.0_101] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[?:1.7.0_101] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.7.0_101] at java.lang.reflect.Method.invoke(Method.java:606) ~[?:1.7.0_101] at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) ~[hadoop-common-2.3.0.jar:?] at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) ~[hadoop-common-2.3.0.jar:?] at org.apache.hadoop.fs.s3native.$Proxy191.initialize(Unknown Source) ~[?:?] at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:272) ~[hadoop-common-2.3.0.jar:?] at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2316) ~[hadoop-common-2.3.0.jar:?] at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:90) ~[hadoop-common-2.3.0.jar:?] at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2350) ~[hadoop-common-2.3.0.jar:?] at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2332) ~[hadoop-common-2.3.0.jar:?] at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:369) ~[hadoop-common-2.3.0.jar:?] at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) ~[hadoop-common-2.3.0.jar:?] at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:691) ~[druid-indexing-hadoop-0.9.1.1.jar:0.9.1.1] at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:469) ~[druid-indexing-hadoop-0.9.1.1.jar:0.9.1.1] at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) ~[hadoop-mapreduce-client-core-2.3.0.jar:?] at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) ~[hadoop-mapreduce-client-core-2.3.0.jar:?] at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) ~[hadoop-mapreduce-client-core-2.3.0.jar:?] at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) ~[hadoop-mapreduce-client-common-2.3.0.jar:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[?:1.7.0_101] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_101] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[?:1.7.0_101] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[?:1.7.0_101] at java.lang.Thread.run(Thread.java:745) ~[?:1.7.0_101] 2016-08-10T03:46:48,567 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Job job_local826592096_0002 failed with state FAILED due to: NA 2016-08-10T03:46:48,572 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Counters: 33 File System Counters FILE: Number of bytes read=34215426 FILE: Number of bytes written=17310701 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 Map-Reduce Framework Map input records=39244 Map output records=39244 Map output bytes=16736001 Map output materialized bytes=16892983 Input split bytes=309 Combine input records=0 Combine output records=0 Reduce input groups=0 Reduce shuffle bytes=16892983 Reduce input records=0 Reduce output records=0 Spilled Records=39244 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=96 CPU time spent (ms)=0 Physical memory (bytes) snapshot=0 Virtual memory (bytes) snapshot=0 Total committed heap usage (bytes)=1912078336 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=0 2016-08-10T03:46:48,577 INFO [task-runner-0-priority-0] io.druid.indexer.JobHelper - Deleting path[var/druid/hadoop-tmp/wikiticker/2016-08-10T034630.036Z/8b44ff31242a48ff96ed6cdbc1547708] 2016-08-10T03:46:48,590 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_wikiticker_2016-08-10T03:46:29.980Z, type=index_hadoop, dataSource=wikiticker}] java.lang.RuntimeException: java.lang.reflect.InvocationTargetException at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?] at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:204) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1] at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:208) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1] at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid-indexing-service-0.9.1.1.jar:0.9.1.1] at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid-indexing-service-0.9.1.1.jar:0.9.1.1] at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_101] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_101] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_101] at java.lang.Thread.run(Thread.java:745) [?:1.7.0_101] Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.7.0_101] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[?:1.7.0_101] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.7.0_101] at java.lang.reflect.Method.invoke(Method.java:606) ~[?:1.7.0_101] at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:201) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1] ... 7 more Caused by: com.metamx.common.ISE: Job[class io.druid.indexer.IndexGeneratorJob] failed! at io.druid.indexer.JobHelper.runJobs(JobHelper.java:343) ~[druid-indexing-hadoop-0.9.1.1.jar:0.9.1.1] at io.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:94) ~[druid-indexing-hadoop-0.9.1.1.jar:0.9.1.1] at io.druid.indexing.common.task.HadoopIndexTask$HadoopIndexGeneratorInnerProcessing.runTask(HadoopIndexTask.java:261) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.7.0_101] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[?:1.7.0_101] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.7.0_101] at java.lang.reflect.Method.invoke(Method.java:606) ~[?:1.7.0_101] at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:201) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1] ... 7 more 2016-08-10T03:46:48,596 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_hadoop_wikiticker_2016-08-10T03:46:29.980Z] status changed to [FAILED]. 2016-08-10T03:46:48,598 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: { "id" : "index_hadoop_wikiticker_2016-08-10T03:46:29.980Z", "status" : "FAILED", "duration" : 14282 }

The attachments is the full log and common conf file.

Thanks!

common.runtime.properties (3.69 KB)

druid.log (180 KB)

Hi,
when reading data from s3 you also need to add the s3 access keys to your hadoop configs by either setting them in your hadoop config files or in your a jobProperties section under tuningConfig to your task spec file.

e.g -

“jobProperties” : {
“fs.s3.awsAccessKeyId” : “MY_ACCESS_KEY”,
“fs.s3.awsSecretAccessKey” : “MY_SECRET_KEY”

}

We get the same exception after setting access key and secret access key in jobProperties.

However, when we setting :@ before bucket as following

druid.storage.bucket=:@

the first exception will disappear, but throw another exception.

org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error: Failed to automatically set required header "x-amz-content-sha256" for request with entity org.jets3t.service.impl.rest.httpclient.RepeatableRequestEntity@51194de9
        at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleS3ServiceException(Jets3tNativeFileSystemStore.java:310) ~[hadoop-common-2.3.0.jar:?]
        at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.storeFile(Jets3tNativeFileSystemStore.java:89) ~[hadoop-common-2.3.0.jar:?]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.7.0_101]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[?:1.7.0_101]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.7.0_101]
        at java.lang.reflect.Method.invoke(Method.java:606) ~[?:1.7.0_101]
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) ~[hadoop-common-2.3.0.jar:?]
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) ~[hadoop-common-2.3.0.jar:?]
        at org.apache.hadoop.fs.s3native.$Proxy191.storeFile(Unknown Source) ~[?:?]
        at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.close(NativeS3FileSystem.java:221) ~[hadoop-common-2.3.0.jar:?]
        at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70) ~[hadoop-common-2.3.0.jar:?]
        at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) ~[hadoop-common-2.3.0.jar:?]
        at java.util.zip.DeflaterOutputStream.close(DeflaterOutputStream.java:241) ~[?:1.7.0_101]
        at java.util.zip.ZipOutputStream.close(ZipOutputStream.java:360) ~[?:1.7.0_101]
        at io.druid.indexer.JobHelper.zipAndCopyDir(JobHelper.java:511) ~[druid-indexing-hadoop-0.9.1.1.jar:0.9.1.1]
        at io.druid.indexer.JobHelper$4.push(JobHelper.java:374) [druid-indexing-hadoop-0.9.1.1.jar:0.9.1.1]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.7.0_101]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[?:1.7.0_101]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.7.0_101]
        at java.lang.reflect.Method.invoke(Method.java:606) ~[?:1.7.0_101]
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) [hadoop-common-2.3.0.jar:?]
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) [hadoop-common-2.3.0.jar:?]
        at com.sun.proxy.$Proxy192.push(Unknown Source) [?:?]
        at io.druid.indexer.JobHelper.serializeOutIndex(JobHelper.java:386) [druid-indexing-hadoop-0.9.1.1.jar:0.9.1.1]
        at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:703) [druid-indexing-hadoop-0.9.1.1.jar:0.9.1.1]
        at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:469) [druid-indexing-hadoop-0.9.1.1.jar:0.9.1.1]
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) [hadoop-mapreduce-client-core-2.3.0.jar:?]
        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) [hadoop-mapreduce-client-core-2.3.0.jar:?]
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) [hadoop-mapreduce-client-core-2.3.0.jar:?]
        at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) [hadoop-mapreduce-client-common-2.3.0.jar:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [?:1.7.0_101]
        at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_101]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_101]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_101]
        at java.lang.Thread.run(Thread.java:745) [?:1.7.0_101]
Caused by: org.jets3t.service.S3ServiceException: Request Error: Failed to automatically set required header "x-amz-content-sha256" for request with entity org.jets3t.service.impl.rest.httpclient.RepeatableRequestEntity@51194de9
        at org.jets3t.service.S3Service.putObject(S3Service.java:2358) ~[jets3t-0.9.4.jar:0.9.4]
        at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.storeFile(Jets3tNativeFileSystemStore.java:87) ~[hadoop-common-2.3.0.jar:?]
        ... 33 more
Caused by: java.lang.RuntimeException: Failed to automatically set required header "x-amz-content-sha256" for request with entity org.jets3t.service.impl.rest.httpclient.RepeatableRequestEntity@51194de9
        at org.jets3t.service.utils.SignatureUtils.awsV4GetOrCalculatePayloadHash(SignatureUtils.java:259) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.impl.rest.httpclient.RestStorageService.authorizeHttpRequest(RestStorageService.java:778) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:326) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:279) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRestPut(RestStorageService.java:1157) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.impl.rest.httpclient.RestStorageService.createObjectImpl(RestStorageService.java:1968) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.impl.rest.httpclient.RestStorageService.putObjectWithRequestEntityImpl(RestStorageService.java:1889) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.impl.rest.httpclient.RestStorageService.putObjectImpl(RestStorageService.java:1881) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.StorageService.putObject(StorageService.java:840) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.S3Service.putObject(S3Service.java:2212) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.S3Service.putObject(S3Service.java:2356) ~[jets3t-0.9.4.jar:0.9.4]
        at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.storeFile(Jets3tNativeFileSystemStore.java:87) ~[hadoop-common-2.3.0.jar:?]
        ... 33 more
Caused by: java.io.IOException: Resetting to invalid mark
        at java.io.BufferedInputStream.reset(BufferedInputStream.java:437) ~[?:1.7.0_101]
        at org.jets3t.service.utils.ServiceUtils.hash(ServiceUtils.java:238) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.utils.ServiceUtils.hashSHA256(ServiceUtils.java:267) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.utils.SignatureUtils.awsV4GetOrCalculatePayloadHash(SignatureUtils.java:251) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.impl.rest.httpclient.RestStorageService.authorizeHttpRequest(RestStorageService.java:778) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:326) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:279) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRestPut(RestStorageService.java:1157) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.impl.rest.httpclient.RestStorageService.createObjectImpl(RestStorageService.java:1968) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.impl.rest.httpclient.RestStorageService.putObjectWithRequestEntityImpl(RestStorageService.java:1889) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.impl.rest.httpclient.RestStorageService.putObjectImpl(RestStorageService.java:1881) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.StorageService.putObject(StorageService.java:840) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.S3Service.putObject(S3Service.java:2212) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.S3Service.putObject(S3Service.java:2356) ~[jets3t-0.9.4.jar:0.9.4]
        at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.storeFile(Jets3tNativeFileSystemStore.java:87) ~[hadoop-common-2.3.0.jar:?]
        ... 33 more

``

在 2016年8月11日星期四 UTC+8下午12:10:46,Nishant Bangarwa写道:

Hi,

Can you try using this jobProperties spec? S3 and S3N need different configurations:

"jobProperties" : {
 "fs.s3.awsAccessKeyId" : "YOUR_ACCESS_KEY",
 "fs.s3.awsSecretAccessKey" : "YOUR_SECRET_KEY",
 "fs.s3.impl" : "org.apache.hadoop.fs.s3native.NativeS3FileSystem",
 "fs.s3n.awsAccessKeyId" : "YOUR_ACCESS_KEY",
 "fs.s3n.awsSecretAccessKey" : "YOUR_SECRET_KEY",
 "fs.s3n.impl" : "org.apache.hadoop.fs.s3native.NativeS3FileSystem",
 "io.compression.codecs" : "org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec"
}

I have tried the jobProperties spec. As I understand it, the deep storage configuration is in common.runtime.properties as follows:

For S3:

druid.storage.type=s3

druid.storage.bucket=

druid.storage.baseKey=

druid.s3.accessKey=

druid.s3.secretKey=

``

The jobProperties spec is used for loading data from s3, but here is loading a local data to S3 deep storage.

I have tried both deep storage spec in common.runtime.properties and jobProperties spec with s3n.

在 2016年8月13日星期六 UTC+8上午7:34:34,Jonathan Wei写道:

Can you post your indexing task with the jobProperties included?

Hi, what region of AWS is this?

I have tried the sample indexing task with the jobProperties.

The attachment is the index json file.

在 2016年8月16日星期二 UTC+8上午5:40:42,Jonathan Wei写道:

wikiticker-index.json (2.54 KB)

China.

I have set following spec in jets3t.properties:

s3service.s3-endpoint=s3.cn-north-1.amazonaws.com.cn

``

Loading data from s3 is available, but s3 deep storage is not.

在 2016年8月16日星期二 UTC+8上午8:42:20,Fangjin Yang写道:

There are two “tuningConfig” sections in the task json, the latter one has an empty “jobProperties” field, it might be overriding the earlier one. Can you try removing that and see if that works?

Thank you for pointing out my mistake. The first exception disappear but and another exception was thrown:

org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error: Failed to automatically set required header "x-amz-content-sha256" for request with entity org.jets3t.service.impl.rest.httpclient.RepeatableRequestEntity@3e176669
        at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleS3ServiceException(Jets3tNativeFileSystemStore.java:310) ~[hadoop-common-2.3.0.jar:?]
        at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.storeFile(Jets3tNativeFileSystemStore.java:89) ~[hadoop-common-2.3.0.jar:?]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.7.0_101]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[?:1.7.0_101]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.7.0_101]
        at java.lang.reflect.Method.invoke(Method.java:606) ~[?:1.7.0_101]
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) ~[hadoop-common-2.3.0.jar:?]
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) [hadoop-common-2.3.0.jar:?]
        at org.apache.hadoop.fs.s3native.$Proxy191.storeFile(Unknown Source) [?:?]
        at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.close(NativeS3FileSystem.java:221) [hadoop-common-2.3.0.jar:?]
        at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70) [hadoop-common-2.3.0.jar:?]
        at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) [hadoop-common-2.3.0.jar:?]
        at java.util.zip.DeflaterOutputStream.close(DeflaterOutputStream.java:241) [?:1.7.0_101]
        at java.util.zip.ZipOutputStream.close(ZipOutputStream.java:360) [?:1.7.0_101]
        at io.druid.indexer.JobHelper.zipAndCopyDir(JobHelper.java:511) [druid-indexing-hadoop-0.9.1.1.jar:0.9.1.1]
        at io.druid.indexer.JobHelper$4.push(JobHelper.java:374) [druid-indexing-hadoop-0.9.1.1.jar:0.9.1.1]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.7.0_101]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[?:1.7.0_101]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.7.0_101]
        at java.lang.reflect.Method.invoke(Method.java:606) ~[?:1.7.0_101]
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) [hadoop-common-2.3.0.jar:?]
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) [hadoop-common-2.3.0.jar:?]
        at com.sun.proxy.$Proxy192.push(Unknown Source) [?:?]
        at io.druid.indexer.JobHelper.serializeOutIndex(JobHelper.java:386) [druid-indexing-hadoop-0.9.1.1.jar:0.9.1.1]
        at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:703) [druid-indexing-hadoop-0.9.1.1.jar:0.9.1.1]
        at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:469) [druid-indexing-hadoop-0.9.1.1.jar:0.9.1.1]
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) [hadoop-mapreduce-client-core-2.3.0.jar:?]
        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) [hadoop-mapreduce-client-core-2.3.0.jar:?]
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) [hadoop-mapreduce-client-core-2.3.0.jar:?]
        at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) [hadoop-mapreduce-client-common-2.3.0.jar:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [?:1.7.0_101]
        at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_101]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_101]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_101]
        at java.lang.Thread.run(Thread.java:745) [?:1.7.0_101]
Caused by: org.jets3t.service.S3ServiceException: Request Error: Failed to automatically set required header "x-amz-content-sha256" for request with entity org.jets3t.service.impl.rest.httpclient.RepeatableRequestEntity@3e176669
        at org.jets3t.service.S3Service.putObject(S3Service.java:2358) ~[jets3t-0.9.4.jar:0.9.4]
        at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.storeFile(Jets3tNativeFileSystemStore.java:87) ~[hadoop-common-2.3.0.jar:?]
        ... 33 more
Caused by: java.lang.RuntimeException: Failed to automatically set required header "x-amz-content-sha256" for request with entity org.jets3t.service.impl.rest.httpclient.RepeatableRequestEntity@3e176669
        at org.jets3t.service.utils.SignatureUtils.awsV4GetOrCalculatePayloadHash(SignatureUtils.java:259) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.impl.rest.httpclient.RestStorageService.authorizeHttpRequest(RestStorageService.java:778) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:326) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:279) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRestPut(RestStorageService.java:1157) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.impl.rest.httpclient.RestStorageService.createObjectImpl(RestStorageService.java:1968) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.impl.rest.httpclient.RestStorageService.putObjectWithRequestEntityImpl(RestStorageService.java:1889) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.impl.rest.httpclient.RestStorageService.putObjectImpl(RestStorageService.java:1881) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.StorageService.putObject(StorageService.java:840) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.S3Service.putObject(S3Service.java:2212) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.S3Service.putObject(S3Service.java:2356) ~[jets3t-0.9.4.jar:0.9.4]
        at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.storeFile(Jets3tNativeFileSystemStore.java:87) ~[hadoop-common-2.3.0.jar:?]
        ... 33 more
Caused by: java.io.IOException: Resetting to invalid mark
        at java.io.BufferedInputStream.reset(BufferedInputStream.java:437) ~[?:1.7.0_101]
        at org.jets3t.service.utils.ServiceUtils.hash(ServiceUtils.java:238) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.utils.ServiceUtils.hashSHA256(ServiceUtils.java:267) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.utils.SignatureUtils.awsV4GetOrCalculatePayloadHash(SignatureUtils.java:251) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.impl.rest.httpclient.RestStorageService.authorizeHttpRequest(RestStorageService.java:778) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:326) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:279) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRestPut(RestStorageService.java:1157) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.impl.rest.httpclient.RestStorageService.createObjectImpl(RestStorageService.java:1968) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.impl.rest.httpclient.RestStorageService.putObjectWithRequestEntityImpl(RestStorageService.java:1889) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.impl.rest.httpclient.RestStorageService.putObjectImpl(RestStorageService.java:1881) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.StorageService.putObject(StorageService.java:840) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.S3Service.putObject(S3Service.java:2212) ~[jets3t-0.9.4.jar:0.9.4]
        at org.jets3t.service.S3Service.putObject(S3Service.java:2356) ~[jets3t-0.9.4.jar:0.9.4]
        at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.storeFile(Jets3tNativeFileSystemStore.java:87) ~[hadoop-common-2.3.0.jar:?]
        ... 33 more

``

在 2016年8月16日星期二 UTC+8下午12:05:44,Jonathan Wei写道:

Seems like that error might be an issue with hadoop/jets3t versions and the China AWS region only supporting V4 authentication:

https://community.cloudera.com/t5/Storage-Random-Access-HDFS/cloudera-does-not-support-access-to-s3-within-eu-frankfurt-Aws/td-p/32369

http://docs.aws.amazon.com/general/latest/gr/signature-version-2.html

Not sure what the resolution to the problem is.

I believe this same problem impacts a few of the new AWS regions as well. The resolution is actually a bit convoluted and will require recompiling Druid with line changes. I believe Hadoop 2.8 may fix a lot of these issues.

Shuai Chang posted a possible solution here:

https://groups.google.com/forum/#!topic/druid-user/i3qK0u5BDGM

I tried that on a batch ingestion task, putting the jets3t.properties file in the hadoop conf directory with core-site.xml, etc., and it was able to read input data from a bucket in the Seoul region.

To add to this, based on additional tests that I ran today:

I was running with -Dhadoop.mapreduce.job.user.classpath.first=true as described here: http://druid.io/docs/latest/operations/other-hadoop.html

I’m also setting my hadoopDependencyCoordinates to use hadoop 2.7.1 libraries

In addition, I had to add the following to jets3t.properties which I placed in both druid’s _common directory and the hadoop config directory:

uploads.stream-retry-buffer-size=67108865

Otherwise, in local mode, I was getting some IOException:

https://bitbucket.org/jmurty/jets3t/issues/228/upload-large-size-file-will-lead-to

On a remote hadoop cluster (sequenceiq 2.7.1 docker image), my job would hang at “map 100%, reduce 100%” without that setting (needs more confirmation)

Thanks,

Jon

This is another possible solution, using S3A:

https://groups.google.com/d/msg/druid-user/VYAySNm7PUw/uCE_vaaCBgAJ

Jonathan,

I’ve reached the same link and solution as yours to work around the problem. My findings is that uploads.stream-retry-buffer-size has to be bigger than the segment size local hadoop indexer is trying to upload to S3, otherwise the upload will fail with java.io.IOException: Resetting to invalid mark during SHA256 hashing. For example my segment size was 690MB, setting uploads.stream-retry-buffer-size=67108865 the upload will fail but with uploads.stream-retry-buffer-size=72108865, the upload will be successful. The max number for the config is max_integer 2147483647. Any number bigger than that will result in error.

Having proper uploads.stream-retry-buffer-size unblocks me for the batch ingestion, however having a limit on segment size sounds a bit weird to me (I believe this due to a combination of Hadoop S3N + jets3t), although Druid recommend segment size around 600 - 800MB.

One thing that’s not clear to me is that realtime tasks can upload to S3 without requiring any config for SigV4 but not the case for the batch job, that somehow look like where Druid can probably change?

The difference between realtime and the batch job is probably that realtime uploads go through SegmentPushers, which use jets3t directly (and fairly simply), but Hadoop uploads go through a Hadoop filesystem layer, which does its own thing.