Container is running beyond physical memory limits

Hi,

I’m trying to ingest data from hdfs, but I keep getting errors like following. It seems like the task gets too much data to handle. But the thing is I have only dates as timestamp so I couldn’t further break the data into smaller granularities. Any suggestions about how to solve this issue? Thanks!

Container [pid=70947,containerID=container_e22_1432969244945_6026_01_000075] is running beyond physical memory limits. Current usage: 4.0 GB of 4 GB physical memory used; 5.8 GB of 8.4 GB virtual memory used. Killing container.

Here is more log info.

2015-07-30T00:36:19,528 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 0%
2015-07-30T00:36:29,554 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 15%
2015-07-30T00:36:37,577 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 17%
2015-07-30T00:36:40,586 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 23%
2015-07-30T00:36:46,601 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 26%
2015-07-30T00:36:49,609 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 32%
2015-07-30T00:36:57,628 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 45%
2015-07-30T00:37:00,635 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 66%
2015-07-30T00:37:03,643 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 67%
2015-07-30T00:37:49,754 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 68%
2015-07-30T00:38:43,888 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 69%
2015-07-30T00:39:35,006 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 70%
2015-07-30T00:40:30,140 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 71%
2015-07-30T00:41:21,270 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 72%
2015-07-30T00:42:16,396 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 73%
2015-07-30T00:45:19,858 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - Task Id : attempt_1432969244945_6026_r_000000_2, Status : FAILED
Container [pid=77928,containerID=container_e22_1432969244945_6026_01_000077] is running beyond physical memory limits. Current usage: 4.0 GB of 4 GB physical memory used; 5.8 GB of 8.4 GB virtual memory used. Killing container.
Dump of the process-tree for container_e22_1432969244945_6026_01_000077 :
>- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
>- 77932 77928 77928 77928 (java) 76785 9822 6220464128 1049133 /usr/lib/jvm/j2sdk1.8-oracle/jre/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Djava.net.preferIPv4Stack=true -Xmx3460300800 -Djava.io.tmpdir=/mnt/hdfs_12o/yarn/nm/usercache/qi_wang/appcache/application_1432969244945_6026/container_e22_1432969244945_6026_01_000077/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1432969244945_6026/container_e22_1432969244945_6026_01_000077 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 10.123.204.75 60033 attempt_1432969244945_6026_r_000000_2 77
>- 77928 77926 77928 77928 (bash) 0 0 9822208 289 /bin/bash -c /usr/lib/jvm/j2sdk1.8-oracle/jre/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Djava.net.preferIPv4Stack=true -Xmx3460300800 -Djava.io.tmpdir=/mnt/hdfs_12o/yarn/nm/usercache/qi_wang/appcache/application_1432969244945_6026/container_e22_1432969244945_6026_01_000077/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1432969244945_6026/container_e22_1432969244945_6026_01_000077 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 10.123.204.75 60033 attempt_1432969244945_6026_r_000000_2 77 1>/var/log/hadoop-yarn/container/application_1432969244945_6026/container_e22_1432969244945_6026_01_000077/stdout 2>/var/log/hadoop-yarn/container/application_1432969244945_6026/container_e22_1432969244945_6026_01_000077/stderr

Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

2015-07-30T00:45:20,862 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 0%
2015-07-30T00:45:41,910 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 11%
2015-07-30T00:45:44,917 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 13%
2015-07-30T00:45:53,938 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 18%
2015-07-30T00:45:56,945 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 22%
2015-07-30T00:46:03,965 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 25%
2015-07-30T00:46:06,972 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 32%
2015-07-30T00:46:14,992 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 52%
2015-07-30T00:46:17,999 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 67%
2015-07-30T00:47:12,128 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 68%
2015-07-30T00:48:06,250 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 69%
2015-07-30T00:49:01,361 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 70%
2015-07-30T00:49:52,465 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 71%
2015-07-30T00:50:43,574 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 72%
2015-07-30T00:51:35,685 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 73%
2015-07-30T00:54:30,044 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 100%
2015-07-30T00:54:31,052 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - Job job_1432969244945_6026 failed with state FAILED due to: Task failed task_1432969244945_6026_r_000000
Job failed as tasks failed. failedMaps:0 failedReduces:1

2015-07-30T00:54:31,139 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - Counters: 38
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=810738421
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2326661134
HDFS: Number of bytes written=0
HDFS: Number of read operations=219
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Failed reduce tasks=4
Launched map tasks=73
Launched reduce tasks=4
Data-local map tasks=61
Rack-local map tasks=12
Total time spent by all maps in occupied slots (ms)=1428866
Total time spent by all reduces in occupied slots (ms)=2184837
Total time spent by all map tasks (ms)=1428866
Total time spent by all reduce tasks (ms)=2184837
Total vcore-seconds taken by all map tasks=1428866
Total vcore-seconds taken by all reduce tasks=2184837
Total megabyte-seconds taken by all map tasks=5852635136
Total megabyte-seconds taken by all reduce tasks=8949092352
Map-Reduce Framework
Map input records=30597989
Map output records=30597989
Map output bytes=3795354607
Map output materialized bytes=802113853
Input split bytes=10001
Combine input records=0
Spilled Records=30597989
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=277433
CPU time spent (ms)=5533170
Physical memory (bytes) snapshot=123575734272
Virtual memory (bytes) snapshot=395487993856
Total committed heap usage (bytes)=224425672704
File Input Format Counters
Bytes Read=2326651133

-Xmx3460300800 is a bit high for a 4GB yarn container, given that the Druid indexer allocates a bunch of off-heap storage too. I usually use these settings:

mapreduce.map.memory.mb=2048

mapreduce.map.java.opts="-server -Xmx1536m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"

mapreduce.reduce.memory.mb=6144

mapreduce.reduce.java.opts="-server -Xmx2560m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"

Hi Gian,

Thanks for help! Where do i put those configuration? I tried to put them in the config/overlord/runtime.properties and it seems not working. I also saw the hadoop configuration page http://druid.io/docs/latest/configuration/hadoop.html. but we haven’t really used it before.

Thanks!

If you have your hadoop config xml’s on the classpath, you can add those configs to mapred-site.xml.

I see. Is there any settings we can do on the Druid side instead of yarn side?

They could still be added on the Druid side- you can definitely put different xml’s on the Druid classpath than on the YARN NodeManager classpath. The ones on the Druid classpath will take priority since they’ll be present on job submission.

Although if you’d rather use the same xmls everywhere, you can also add the properties to the “jobProperties” of the “tuningConfig” in the druid indexing spec.

Got it. Will try. Thanks!