Exception while batch ingesting through lambda job

Hi,

We are facing exception in Map Reduce Hadoop Indexing while doing Batch ingestion through Lambda Job.

We tried increasing the size of the EMR Hadoop Cluster (Making memory to 12 GB instead of 16 Gb) but still no help. Same issue.

Following is the stack trace of the same :

2018-03-15T17:27:05,819 WARN [task-runner-0-priority-0] org.apache.hadoop.mapreduce.JobResourceUploader - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.

2018-03-15T17:27:05,829 WARN [task-runner-0-priority-0] org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set. User classes may not be found. See Job or Job#setJar(String).

2018-03-15T17:27:08,529 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 144

2018-03-15T17:27:08,884 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.JobSubmitter - number of splits:144

2018-03-15T17:27:08,969 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1515484679230_0010

60.266: [GC pause (G1 Evacuation Pause) (young), 0.0325226 secs]

[Parallel Time: 20.4 ms, GC Workers: 23]

[GC Worker Start (ms): Min: 60266.9, Avg: 60267.5, Max: 60268.2, Diff: 1.3]

[Ext Root Scanning (ms): Min: 0.1, Avg: 1.3, Max: 16.3, Diff: 16.2, Sum: 30.8]

[Update RS (ms): Min: 0.0, Avg: 0.3, Max: 1.2, Diff: 1.2, Sum: 7.9]

[Processed Buffers: Min: 0, Avg: 1.2, Max: 7, Diff: 7, Sum: 28]

[Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.3, Sum: 2.0]

[Code Root Scanning (ms): Min: 0.0, Avg: 0.9, Max: 6.2, Diff: 6.2, Sum: 21.8]

[Object Copy (ms): Min: 3.2, Avg: 16.1, Max: 18.5, Diff: 15.3, Sum: 369.5]

[Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 2.5]

[Termination Attempts: Min: 1, Avg: 128.0, Max: 159, Diff: 158, Sum: 2945]

[GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.5, Diff: 0.5, Sum: 4.7]

[GC Worker Total (ms): Min: 18.4, Avg: 19.1, Max: 19.7, Diff: 1.3, Sum: 439.1]

[GC Worker End (ms): Min: 60286.4, Avg: 60286.6, Max: 60286.8, Diff: 0.4]

[Code Root Fixup: 1.8 ms]

[Code Root Purge: 0.1 ms]

[Clear CT: 0.9 ms]

[Other: 9.3 ms]

[Choose CSet: 0.0 ms]

[Ref Proc: 6.1 ms]

[Ref Enq: 0.0 ms]

[Redirty Cards: 0.7 ms]

[Humongous Register: 0.3 ms]

[Humongous Reclaim: 0.0 ms]

[Free CSet: 1.8 ms]

[Eden: 528.0M(528.0M)->0.0B(523.0M) Survivors: 34.0M->39.0M Heap: 569.0M(938.0M)->45.5M(938.0M)]

[Times: user=0.39 sys=0.06, real=0.04 secs]

2018-03-15T17:27:09,117 INFO [task-runner-0-priority-0] org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources.

2018-03-15T17:27:10,551 INFO [task-runner-0-priority-0] org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1515484679230_0010

2018-03-15T17:27:10,582 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - The url to track the job: http://ip-10-52-13-196.us-west-2.compute.internal:20888/proxy/application_1515484679230_0010/

2018-03-15T17:27:10,583 INFO [task-runner-0-priority-0] io.druid.indexer.DetermineHashedPartitionsJob - Job eventstream_ds-determine_partitions_hashed-Optional.of([2018-03-13T00:00:00.000Z/2018-03-15T00:00:00.000Z]) submitted, status available at: http://ip-10-52-13-196.us-west-2.compute.internal:20888/proxy/application_1515484679230_0010/

2018-03-15T17:27:10,583 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Running job: job_1515484679230_0010

2018-03-15T17:27:26,779 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Job job_1515484679230_0010 running in uber mode : false

2018-03-15T17:27:26,781 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 100%

2018-03-15T17:27:27,816 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Job job_1515484679230_0010 failed with state KILLED due to: REDUCE capability required is more than the supported max container capability in the cluster. Killing the Job. reduceResourceRequest: <memory:14592, vCores:1> maxContainerCapability:<memory:11520, vCores:128>

Job received Kill while in RUNNING state.

2018-03-15T17:27:27,884 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Counters: 10

Job Counters

Killed map tasks=144

Killed reduce tasks=2

Total time spent by all maps in occupied slots (ms)=0

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=0

Total time spent by all reduce tasks (ms)=0

Total vcore-milliseconds taken by all map tasks=0

Total vcore-milliseconds taken by all reduce tasks=0

Total megabyte-milliseconds taken by all map tasks=0

Total megabyte-milliseconds taken by all reduce tasks=0

2018-03-15T17:27:27,887 ERROR [task-runner-0-priority-0] io.druid.indexer.DetermineHashedPartitionsJob - Job failed: job_1515484679230_0010

2018-03-15T17:27:27,887 INFO [task-runner-0-priority-0] io.druid.indexer.JobHelper - Deleting path[/tmp/druid-indexing/eventstream_ds/2018-03-15T172608.616Z/4a7cd307431842ce97a5342099691151]

2018-03-15T17:27:27,910 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_eventstream_ds_2018-03-15T17:26:08.628Z, type=index_hadoop, dataSource=eventstream_ds}]

java.lang.RuntimeException: java.lang.reflect.InvocationTargetException

at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]

at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:204) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]

at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:175) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid-indexing-service-0.9.1.1.jar:0.9.1.1]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid-indexing-service-0.9.1.1.jar:0.9.1.1]

at java.util.concurrent.FutureTask.run(Unknown Source) [?:1.8.0_92]

at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:1.8.0_92]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:1.8.0_92]

at java.lang.Thread.run(Unknown Source) [?:1.8.0_92]

Caused by: java.lang.reflect.InvocationTargetException

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_92]

at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) ~[?:1.8.0_92]

at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) ~[?:1.8.0_92]

at java.lang.reflect.Method.invoke(Unknown Source) ~[?:1.8.0_92]

at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:201) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]

… 7 more

Caused by: com.metamx.common.ISE: Job[class io.druid.indexer.DetermineHashedPartitionsJob] failed!

at io.druid.indexer.JobHelper.runJobs(JobHelper.java:343) ~[druid-indexing-hadoop-0.9.1.1.jar:0.9.1.1]

at io.druid.indexer.HadoopDruidDetermineConfigurationJob.run(HadoopDruidDetermineConfigurationJob.java:91) ~[druid-indexing-hadoop-0.9.1.1.jar:0.9.1.1]

at io.druid.indexing.common.task.HadoopIndexTask$HadoopDetermineConfigInnerProcessing.runTask(HadoopIndexTask.java:291) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_92]

at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) ~[?:1.8.0_92]

at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) ~[?:1.8.0_92]

at java.lang.reflect.Method.invoke(Unknown Source) ~[?:1.8.0_92]

at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:201) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]

… 7 more

2018-03-15T17:27:27,917 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_hadoop_eventstream_ds_2018-03-15T17:26:08.628Z] status changed to [FAILED].

2018-03-15T17:27:27,920 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {

“id” : “index_hadoop_eventstream_ds_2018-03-15T17:26:08.628Z”,

“status” : “FAILED”,

“duration” : 75907

}

2018-03-15T17:27:27,927 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.server.coordination.AbstractDataSegmentAnnouncer.stop()] on object[io.druid.server.coordination.BatchDataSegmentAnnouncer@10980560].

2018-03-15T17:27:27,927 INFO [main] io.druid.server.coordination.AbstractDataSegmentAnnouncer - Stopping class io.druid.server.coordination.BatchDataSegmentAnnouncer with config[io.druid.server.initialization.ZkPathsConfig@22e2266d]

2018-03-15T17:27:27,927 INFO [main] io.druid.curator.announcement.Announcer - unannouncing [/druid/announcements/10.52.15.71:8100]

2018-03-15T17:27:27,939 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.server.listener.announcer.ListenerResourceAnnouncer.stop()] on object[io.druid.query.lookup.LookupResourceListenerAnnouncer@5b35c7d].

2018-03-15T17:27:27,939 INFO [main] io.druid.curator.announcement.Announcer - unannouncing [/druid/listeners/lookups/__default/10.52.15.71:8100]

2018-03-15T17:27:27,942 INFO [main] io.druid.server.listener.announcer.ListenerResourceAnnouncer - Unannouncing start time on [/druid/listeners/lookups/__default/10.52.15.71:8100]

2018-03-15T17:27:27,942 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.query.lookup.LookupReferencesManager.stop()] on object[io.druid.query.lookup.LookupReferencesManager@2de6f1bc].

2018-03-15T17:27:27,942 INFO [main] io.druid.query.lookup.LookupReferencesManager - Stopping lookup factory references manager

2018-03-15T17:27:27,946 INFO [main] org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@27585351{HTTP/1.1}{0.0.0.0:8100}

2018-03-15T17:27:27,947 INFO [main] org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.s.ServletContextHandler@674da77b{/,null,UNAVAILABLE}

2018-03-15T17:27:27,951 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.server.log.FileRequestLogger.stop()] on object[io.druid.server.log.FileRequestLogger@43c57161].

2018-03-15T17:27:27,951 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.indexing.worker.executor.ExecutorLifecycle.stop() throws java.lang.Exception] on object[io.druid.indexing.worker.executor.ExecutorLifecycle@3c380bd8].

2018-03-15T17:27:27,952 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.indexing.overlord.ThreadPoolTaskRunner.stop()] on object[io.druid.indexing.overlord.ThreadPoolTaskRunner@37c5284a].

2018-03-15T17:27:27,952 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.curator.discovery.ServerDiscoverySelector.stop() throws java.io.IOException] on object[io.druid.curator.discovery.ServerDiscoverySelector@31120021].

2018-03-15T17:27:27,955 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.curator.announcement.Announcer.stop()] on object[io.druid.curator.announcement.Announcer@286090c].

2018-03-15T17:27:27,956 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.curator.discovery.ServerDiscoverySelector.stop() throws java.io.IOException] on object[io.druid.curator.discovery.ServerDiscoverySelector@2d55e826].

2018-03-15T17:27:27,956 INFO [main] io.druid.curator.CuratorModule - Stopping Curator

2018-03-15T17:27:27,957 INFO [Curator-Framework-0] org.apache.curator.framework.imps.CuratorFrameworkImpl - backgroundOperationsLoop exiting

2018-03-15T17:27:27,960 INFO [main] org.apache.zookeeper.ZooKeeper - Session: 0x2620a1b66f100cc closed

2018-03-15T17:27:27,960 INFO [main-EventThread] org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x2620a1b66f100cc

2018-03-15T17:27:27,960 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void com.metamx.http.client.NettyHttpClient.stop()] on object[com.metamx.http.client.NettyHttpClient@2db4ad1].

2018-03-15T17:27:27,975 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.server.namespace.NamespacedExtractionModule$NamespaceStaticConfiguration.stop()] on object[io.druid.server.namespace.NamespacedExtractionModule$NamespaceStaticConfiguration@2daf06fc].

2018-03-15T17:27:27,976 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void com.metamx.metrics.MonitorScheduler.stop()] on object[com.metamx.metrics.MonitorScheduler@ba17be6].

2018-03-15T17:27:27,977 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void com.metamx.emitter.service.ServiceEmitter.close() throws java.io.IOException] on object[com.metamx.emitter.service.ServiceEmitter@16a35bd].

2018-03-15T17:27:27,977 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.initialization.Log4jShutterDownerModule$Log4jShutterDowner.stop()] on object[io.druid.initialization.Log4jShutterDownerModule$Log4jShutterDowner@4ee25d80].

Heap

garbage-first heap total 960512K, used 155135K [0x0000000640000000, 0x0000000640101d50, 0x00000007c0000000)

region size 1024K, 146 young (149504K), 39 survivors (39936K)

Metaspace used 83363K, capacity 84380K, committed 85120K, reserved 1124352K

class space used 10506K, capacity 10778K, committed 10880K, reserved 1048576K

Any help would be highly helpful here as this is blocking our release.

We have tried many use cases to drill down the issue but not able to find out the root cause or atleast where the issue is ?

Thanks,

Pravesh Gupta