Druid batch load with AWS EMR performance issue

Hi,

We recently tested druid batch load with AWS EMR, we experienced random task failures. Our druid cluster has 4 middleManger and each middleManager has 32 workers. So we kick off 128 batch tasks simultaneously. The batch tasks are loading data from AWS S3 bucket to druid. We are using using AWS EMR with S3N, hadoop version 2.7.3. Anybody know what can cause this error?

  1. Right after the task get back from EMR, we have an error of ‘java.lang.RuntimeException: com.fasterxml.jackson.databind.JsonMappingException: No content to map due to end-of-input’.

2017-12-08T16:50:43,020 INFO [task-runner-0-priority-0] org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server

2017-12-08T16:50:43,033 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Job job_1512142513844_31933 running in uber mode : false
2017-12-08T16:50:43,034 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job -  map 100% reduce 100%
2017-12-08T16:50:43,053 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Job job_1512142513844_31933 completed successfully
2017-12-08T16:50:43,128 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Counters: 55
	File System Counters
		FILE: Number of bytes read=188182
		.....
	File Output Format Counters
		Bytes Written=0
2017-12-08T16:50:43,276 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_AnalyticsStress-396_2017-12-08T16:39:37.272Z, type=index_hadoop, dataSource=AnalyticsStress-396}]
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
	at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:218) ~[druid-indexing-service-0.10.1.jar:0.10.1]
	at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:224) ~[druid-indexing-service-0.10.1.jar:0.10.1]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid-indexing-service-0.10.1.jar:0.10.1]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid-indexing-service-0.10.1.jar:0.10.1]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_131]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_131]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:215) ~[druid-indexing-service-0.10.1.jar:0.10.1]
	... 7 more
Caused by: java.lang.RuntimeException: com.fasterxml.jackson.databind.JsonMappingException: **No content to map due to end-of-input**
 at [Source: org.apache.hadoop.hdfs.client.HdfsDataInputStream@7f2b6237; line: 1, column: 1]
	at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
	at io.druid.indexer.IndexGeneratorJob.getPublishedSegments(IndexGeneratorJob.java:127) ~[druid-indexing-hadoop-0.10.1.jar:0.10.1]
	at io.druid.indexer.HadoopDruidIndexerJob$1.run(HadoopDruidIndexerJob.java:88) ~[druid-indexing-hadoop-0.10.1.jar:0.10.1]
	at io.druid.indexer.JobHelper.runJobs(JobHelper.java:372) ~[druid-indexing-hadoop-0.10.1.jar:0.10.1]
	at io.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:95) ~[druid-indexing-hadoop-0.10.1.jar:0.10.1]
	at io.druid.indexing.common.task.HadoopIndexTask$HadoopIndexGeneratorInnerProcessing.runTask(HadoopIndexTask.java:277) ~[druid-indexing-service-0.10.1.jar:0.10.1]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_131]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_131]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:215) ~[druid-indexing-service-0.10.1.jar:0.10.1]
	... 7 more
  1. We got some random error for certain datasource segment, stating unable to find index.zip file after batch load. But when I look at AWS S3 deep storage, ‘index.zip’ is named as index.zip.0. If i rename index.zip.0 to index.zip, the datasource segment does get loaded fine. Below is a sample error log:

2017-11-30T19:29:19,689 INFO [ZkCoordinator-0] io.druid.storage.s3.S3DataSegmentPuller - Pulling index at path[s3://fsr-bigdata-dev/druid/segments/AnalyticsStress-52/2017-11-28T00:00:00.000Z_2017-11-29T00:00:00.000Z/2017-11-30T18:52:14.735Z/0/index.zip]to outDir[var/druid/segment-cache/AnalyticsStress-52/2017-11-28T00:00:00.000Z_2017-11-29T00:00:00.000Z/2017-11-30T18:52:14.735Z/0]
2017-11-30T19:29:19,753 ERROR [ZkCoordinator-0] io.druid.segment.loading.SegmentLoaderLocalCacheManager - Failed to load segment in current location /opt/druid-0.10.1/var/druid/segment-cache, try next location if any:

{class=io.druid.segment.loading.SegmentLoaderLocalCacheManager, exceptionType=class io.druid.segment.loading.SegmentLoadingException, exceptionMessage=IndexFile[s3://fsr-bigdata-dev/druid/segments/AnalyticsStress-52/2017-11-28T00:00:00.000Z_2017-11-29T00:00:00.000Z/2017-11-30T18:52:14.735Z/0/index.zip] does not exist., location=/opt/druid-0.10.1/var/druid/segment-cache}

Anybody can help me to understand why we are getting these random errors? Is this related to performance on AWS S3?

Thanks
Hong