Hadoop-based batch ingestion exceeds memory limitation

My hadoop based batch ingestion failed with the following message:
2019-01-14 10:27:20,687 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1546568625067_72784_m_000001_0: Container [pid=50155,containerID=container_e200_1546568625067_72784_01_000003] is running beyond physical memory limits. Current usage: 2.2 GB of 2 GB physical memory used; 33.2 GB of 4.2 GB virtual memory used. Killing container.

I added the following settings, but as if nothing works, could someone help?

in /conf/druid/_common/mapred-site.xml





And in my index file:

“jobProperties” : {

“mapreduce.map.java.opts” : “-Duser.timezone=UTC -Dfile.encoding=UTF-8”,

“mapreduce.job.user.classpath.first” : “true”,

“mapreduce.job.classloader”: “true”,

“mapreduce.reduce.java.opts” : “-Duser.timezone=UTC -Dfile.encoding=UTF-8”,

“mapreduce.map.memory.mb” : 2048,

“mapreduce.reduce.memory.mb” : 4096,

“mapreduce.input.fileinputformat.split.minsize” : 125829120,

“mapreduce.input.fileinputformat.split.maxsize” : 268435456


Many thanks,

Your jvm heap size ( mapper/reducer java opts) must fit within mapReduce.map.memory.mb and mapreduce.reduce.memory.mb respectively. This is what yarn allocated to the containers.

Rommel Garcia

Director, Field Engineering

Hi, Rommel

middleManager is running on a VM with 64g memory, so my jvm.properties has












Thought this is for how much memory will get used on the middleManager node. The error I got is from mapreduce job running on remote hadoop cluster. Having said that I did try to change memory configuration, but I always got the same error message which made me think that my changes weren’t taken effective.

Could you share a sample configuration file that works?

That’s not related though to the problem. It’s the Hadoop Ingest that’s not allocating proper memory size for your aYARN Container.

Rommel Garcia

Director, Field Engineering

Yes, Christine, as Rommel pointed out, MM is not used in Hadoop ingestion. You got the error because your Hadoop YARN was not powerful enough to allocate memory required by your ingestion spec. Please try removing these lines from your spec, and see if it helps.

“mapreduce.map.memory.mb” : 2048,

“mapreduce.reduce.memory.mb” : 4096,

“mapreduce.input.fileinputformat.split.minsize” : 125829120,

“mapreduce.input.fileinputformat.split.maxsize” : 268435456