Hello All,
As part of PoC project I am playing with Druid and Amazon S3 as Deep Storage.
Druid software as part HDP stack has been installed on my desktop cluster with Virtualbox VMs as nodes.
HDP-3.0.1.0
Druid 0.12.1
The primary goal is to pull out files from S3 sources and load them into Druid with S3 configured as Deep Storage.
I am intending to pull my test data from S3 with index_hadoop job type and load them into Druid.
Current configuration:
druid.extensions.loadList=[“druid-datasketches”, “druid-hdfs-storage”, “druid-s3-extensions”, “ambari-metrics-emitter”, “postgresql-metadata-storage”]
druid.storage.baseKey=druid/segments
druid.storage.bucket=f-daily-funnel-with-dimentions-20180713
druid.storage.storageDirectory=/apps/druid/warehouse
druid.storage.type=s3
druid.s3.accessKey=*******
druid.s3.secretKey=*******
druid.indexer.logs.directory=/user/druid/logs
druid.indexer.logs.s3Bucket=f-daily-funnel-with-dimentions-20180713
druid.indexer.logs.s3Prefix=druid/logs
druid.indexer.logs.type=s3
``
/usr/hdp/3.0.1.0-187/druid:
./extensions/druid-hdfs-storage/hadoop-aws-3.1.1.3.0.1.0-187.jar
./lib/aws-java-sdk-ec2-1.10.77.jar
./lib/aws-java-sdk-core-1.10.77.jar
./extensions/druid-hdfs-storage/aws-java-sdk-kms-1.10.77.jar
./extensions/druid-hdfs-storage/aws-java-sdk-core-1.10.77.jar
./extensions/druid-hdfs-storage/aws-java-sdk-s3-1.10.77.jar
./extensions/druid-hdfs-storage/aws-java-sdk-bundle-1.11.271.jar
- I successfully tested loading tsv file into Druid with Firehoses.
“ioConfig” : {
“type” : “index”,
“firehose” : {
“type” : “static-s3”,
“uris”: [“s3://f-daily-funnel-with-dimentions-20180713/data/f_daily_funnel_report.tsv”]
},
“appendToExisting” : false
},
``
Why not to go with index type job and firehouse only? In future we are planning to add druid-parquet-extensions and handle data in parquet format only.
- However with index_hadoop type job It doesn’t work.
“ioConfig” : {
“type” : “hadoop”,
“inputSpec” : {
“type” : “static”,
“paths” : “s3n://f-daily-funnel-with-dimentions-20180713/data/f_daily_funnel_report.tsv”
}
},
“tuningConfig” : {
“type” : “hadoop”,
“partitionsSpec” : {
“type” : “hashed”,
“targetPartitionSize” : 5000000
},
“jobProperties” : {
“fs.s3.impl” : “org.apache.hadoop.fs.s3native.NativeS3FileSystem”,
“fs.s3n.impl” : “org.apache.hadoop.fs.s3native.NativeS3FileSystem”
},
“leaveIntermediate”: true
}
``
I am always getting next error.
2018-11-14T15:01:31,939 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_fdf_report_hadoop_2018-11-14T15:01:18.061Z, type=index_hadoop, dataSource=fdf_report_hadoop}]
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:222) ~[druid-indexing-service-0.12.1.3.0.1.0-187.jar:0.12.1.3.0.1.0-187]
at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:184) ~[druid-indexing-service-0.12.1.3.0.1.0-187.jar:0.12.1.3.0.1.0-187]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:444) [druid-indexing-service-0.12.1.3.0.1.0-187.jar:0.12.1.3.0.1.0-187]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:416) [druid-indexing-service-0.12.1.3.0.1.0-187.jar:0.12.1.3.0.1.0-187]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_191]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_191]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_191]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_191]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_191]
at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:219) ~[druid-indexing-service-0.12.1.3.0.1.0-187.jar:0.12.1.3.0.1.0-187]
… 7 more
Caused by: java.lang.RuntimeException: java.io.IOException: The s3n:// client to Amazon S3 is no longer available: please migrate to the s3a:// client
at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
at io.druid.indexer.DetermineHashedPartitionsJob.run(DetermineHashedPartitionsJob.java:209) ~[druid-indexing-hadoop-0.12.1.3.0.1.0-187.jar:0.12.1.3.0.1.0-187]
at io.druid.indexer.JobHelper.runJobs(JobHelper.java:368) ~[druid-indexing-hadoop-0.12.1.3.0.1.0-187.jar:0.12.1.3.0.1.0-187]
at io.druid.indexer.HadoopDruidDetermineConfigurationJob.run(HadoopDruidDetermineConfigurationJob.java:91) ~[druid-indexing-hadoop-0.12.1.3.0.1.0-187.jar:0.12.1.3.0.1.0-187]
at io.druid.indexing.common.task.HadoopIndexTask$HadoopDetermineConfigInnerProcessing.runTask(HadoopIndexTask.java:325) ~[druid-indexing-service-0.12.1.3.0.1.0-187.jar:0.12.1.3.0.1.0-187]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_191]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_191]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_191]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_191]
at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:219) ~[druid-indexing-service-0.12.1.3.0.1.0-187.jar:0.12.1.3.0.1.0-187]
… 7 more
Caused by: java.io.IOException: The s3n:// client to Amazon S3 is no longer available: please migrate to the s3a:// client
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:82) ~[?:?]
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3354) ~[?:?]
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124) ~[?:?]
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3403) ~[?:?]
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3371) ~[?:?]
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:477) ~[?:?]
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361) ~[?:?]
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:522) ~[?:?]
at org.apache.hadoop.mapreduce.lib.input.DelegatingInputFormat.getSplits(DelegatingInputFormat.java:110) ~[?:?]
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:310) ~[?:?]
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:327) ~[?:?]
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:200) ~[?:?]
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570) ~[?:?]
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567) ~[?:?]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_191]
at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_191]
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) ~[?:?]
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567) ~[?:?]
at io.druid.indexer.DetermineHashedPartitionsJob.run(DetermineHashedPartitionsJob.java:119) ~[druid-indexing-hadoop-0.12.1.3.0.1.0-187.jar:0.12.1.3.0.1.0-187]
at io.druid.indexer.JobHelper.runJobs(JobHelper.java:368) ~[druid-indexing-hadoop-0.12.1.3.0.1.0-187.jar:0.12.1.3.0.1.0-187]
at io.druid.indexer.HadoopDruidDetermineConfigurationJob.run(HadoopDruidDetermineConfigurationJob.java:91) ~[druid-indexing-hadoop-0.12.1.3.0.1.0-187.jar:0.12.1.3.0.1.0-187]
at io.druid.indexing.common.task.HadoopIndexTask$HadoopDetermineConfigInnerProcessing.runTask(HadoopIndexTask.java:325) ~[druid-indexing-service-0.12.1.3.0.1.0-187.jar:0.12.1.3.0.1.0-187]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_191]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_191]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_191]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_191]
at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:219) ~[druid-indexing-service-0.12.1.3.0.1.0-187.jar:0.12.1.3.0.1.0-187]
… 7 more
2018-11-14T15:01:31,951 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_hadoop_fdf_report_hadoop_2018-11-14T15:01:18.061Z] status changed to [FAILED].
2018-11-14T15:01:31,954 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
“id” : “index_hadoop_fdf_report_hadoop_2018-11-14T15:01:18.061Z”,
“status” : “FAILED”,
“duration” : 6654
}
``
- Meanwhile If I configured index job
“ioConfig” : {
“type” : “hadoop”,
“inputSpec” : {
“type” : “static”,
“paths” : “s3a://f-daily-funnel-with-dimentions-20180713/data/f_daily_funnel_report.tsv”
}
},
“tuningConfig” : {
“type” : “hadoop”,
“partitionsSpec” : {
“type” : “hashed”,
“targetPartitionSize” : 5000000
},
“jobProperties” : {
“fs.s3.impl” : “org.apache.hadoop.fs.s3native.NativeS3FileSystem”,
“fs.s3n.impl” : “org.apache.hadoop.fs.s3native.NativeS3FileSystem”,
“fs.s3a.impl” : “org.apache.hadoop.fs.s3a.S3AFileSystem”
},
“leaveIntermediate”: true
}
``
I got the error message
2018-11-14T15:15:10,076 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_fdf_report_hadoop_2018-11-14T15:14:56.372Z, type=index_hadoop, dataSource=fdf_report_hadoop}]
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:222) ~[druid-indexing-service-0.12.1.3.0.1.0-187.jar:0.12.1.3.0.1.0-187]
at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:184) ~[druid-indexing-service-0.12.1.3.0.1.0-187.jar:0.12.1.3.0.1.0-187]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:444) [druid-indexing-service-0.12.1.3.0.1.0-187.jar:0.12.1.3.0.1.0-187]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:416) [druid-indexing-service-0.12.1.3.0.1.0-187.jar:0.12.1.3.0.1.0-187]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_191]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_191]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_191]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_191]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_191]
at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:219) ~[druid-indexing-service-0.12.1.3.0.1.0-187.jar:0.12.1.3.0.1.0-187]
… 7 more
Caused by: java.lang.VerifyError: Bad type on operand stack
Exception Details:
Location:
org/apache/hadoop/fs/s3a/S3AFileSystem.s3GetFileStatus(Lorg/apache/hadoop/fs/Path;Ljava/lang/String;Ljava/util/Set;)Lorg/apache/hadoop/fs/s3a/S3AFileStatus; @274: invokestatic
Reason:
Type ‘com/amazonaws/AmazonServiceException’ (current frame, stack[2]) is not assignable to ‘com/amazonaws/SdkBaseException’
Current Frame:
bci: @274
flags: { }
locals: { ‘org/apache/hadoop/fs/s3a/S3AFileSystem’, ‘org/apache/hadoop/fs/Path’, ‘java/lang/String’, ‘java/util/Set’, ‘java/lang/String’, ‘com/amazonaws/AmazonServiceException’ }
stack: { ‘java/lang/String’, ‘java/lang/String’, ‘com/amazonaws/AmazonServiceException’ }
Bytecode:
0x0000000: 2cb6 00c2 9a01 222a 2cb6 0170 3a04 2c19
0x0000010: 04b6 01c7 b801 6399 001e b200 1b13 0243
0x0000020: b901 3002 00bb 0244 59b2 0141 2b2a b400
0x0000030: 27b7 0245 b0b2 001b 1302 46b9 0130 0200
0x0000040: bb02 4459 1904 b601 c719 04b6 0247 b802
0x0000050: 482b 2a2b b601 522a b400 27b7 0249 b03a
0x0000060: 0419 04b6 024b 1101 949f 000d 1302 4c2b
0x0000070: 1904 b800 a2bf a700 0f3a 0413 024c 2b19
0x0000080: 04b8 00a2 bf2c 12f2 b600 f39a 009b bb00
``
Any help would be appreciated.
Best regards,
Yevgen