Hadoop-based batch ingestion exceeds memory limitation

My hadoop based batch ingestion failed with the following message:
2019-01-14 10:27:20,687 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1546568625067_72784_m_000001_0: Container [pid=50155,containerID=container_e200_1546568625067_72784_01_000003] is running beyond physical memory limits. Current usage: 2.2 GB of 2 GB physical memory used; 33.2 GB of 4.2 GB virtual memory used. Killing container.

I added the following settings, but as if nothing works, could someone help?

in /conf/druid/_common/mapred-site.xml

mapreduce.map.java.opts

-Xmx1536m

mapreduce.map.memory.mb

2048

And in my index file:

“jobProperties” : {

“mapreduce.map.java.opts” : “-Duser.timezone=UTC -Dfile.encoding=UTF-8”,

“mapreduce.job.user.classpath.first” : “true”,

“mapreduce.job.classloader”: “true”,

“mapreduce.reduce.java.opts” : “-Duser.timezone=UTC -Dfile.encoding=UTF-8”,

“mapreduce.map.memory.mb” : 2048,

“mapreduce.reduce.memory.mb” : 4096,

“mapreduce.input.fileinputformat.split.minsize” : 125829120,

“mapreduce.input.fileinputformat.split.maxsize” : 268435456

}

Many thanks,

Your jvm heap size ( mapper/reducer java opts) must fit within mapReduce.map.memory.mb and mapreduce.reduce.memory.mb respectively. This is what yarn allocated to the containers.

Rommel Garcia

Director, Field Engineering

Hi, Rommel

middleManager is running on a VM with 64g memory, so my jvm.properties has

-server

-Xms64m

-Xmx64m

-XX:MaxDirectMemorySize=32g

-XX:+UseConcMarkSweepGC

-XX:+PrintGCDetails

-XX:+PrintGCTimeStamps

-Duser.timezone=UTC

-Dfile.encoding=UTF-8

-Djava.io.tmpdir=var/tmp

-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

Thought this is for how much memory will get used on the middleManager node. The error I got is from mapreduce job running on remote hadoop cluster. Having said that I did try to change memory configuration, but I always got the same error message which made me think that my changes weren’t taken effective.

Could you share a sample configuration file that works?
thanks,

That’s not related though to the problem. It’s the Hadoop Ingest that’s not allocating proper memory size for your aYARN Container.

Rommel Garcia

Director, Field Engineering

Yes, Christine, as Rommel pointed out, MM is not used in Hadoop ingestion. You got the error because your Hadoop YARN was not powerful enough to allocate memory required by your ingestion spec. Please try removing these lines from your spec, and see if it helps.

“mapreduce.map.memory.mb” : 2048,

“mapreduce.reduce.memory.mb” : 4096,

“mapreduce.input.fileinputformat.split.minsize” : 125829120,

“mapreduce.input.fileinputformat.split.maxsize” : 268435456