Container beyond memory limits

Hi,

I’m trying to setup druid on my EMR cluster which has 1 master node (Spec -> r5d.4xlarge 16vCore, 128 GiB memory, 600 GB SSD storage) and 0 core nodes.

I am trying to ingest a parquet file (approx. 150mb -> 86 columns and 1.2 millions rows) from an s3 bucket but I’m getting the following error:

attempt_1564002429381_0004_r_000000_3":"Container [pid=16676,containerID=container_1564002429381_0004_01_000015] is running beyond virtual memory limits. Current usage: 1.4 GB of 6 GB physical memory used; 33.0 GB of 30 GB virtual memory used. Killing container.

This is my IOConfig and TuningConfig:

“ioConfig” : {

“type” : “hadoop”,

“inputSpec” : {

“type” : “static”,

“inputFormat”: “org.apache.druid.data.input.parquet.DruidParquetInputFormat”,

“paths” : “s3a://<bucket_name>/<file_name>.snappy.parquet”

}

},

“tuningConfig” : {

“type”: “hadoop”,

“partitionsSpec”: {

“targetPartitionSize”: 5000000

},

“jobProperties” : {

“fs.s3.awsAccessKeyId” : “<Access_Key>”,

“fs.s3.awsSecretAccessKey” : “<Secret_Key>”,

“fs.s3.impl” : “org.apache.hadoop.fs.s3native.NativeS3FileSystem”,

“fs.s3n.awsAccessKeyId” : “<Access_Key>”,

“fs.s3n.awsSecretAccessKey” : “<Secret_Key>”,

“fs.s3n.impl” : “org.apache.hadoop.fs.s3native.NativeS3FileSystem”,

“io.compression.codecs” : “org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec”,

“mapreduce.map.java.opts” : “-Duser.timezone=UTC -Dfile.encoding=UTF-8”,

“mapreduce.job.user.classpath.first” : “true”,

“mapreduce.reduce.java.opts” : “-Duser.timezone=UTC -Dfile.encoding=UTF-8”,

}

I haven’t set mapreduce.reduce.memory.mb property anywhere.

Any help would be greatly appreciated.

Thanks,

Darshan

Hi,

You might have to set/tune the below properties.

mapreduce.map.java.opts - set xmx
mapreduce.reduce.java.opts - set xmx
mapreduce.map.memory.mb - set a value higher than mapreduce.map.java.opts (20% higher usually)
mapreduce.reduce.memory.mb - set a value higher than mapreduce.reduce.java.opts (20% higher usually)

Thanks,

Sashi

Thanks Sashi. I did what you said and it works now!

Thanks,

Darshan

Hi,

Just one question, If I don’t set the mapreduce.reduce.memory.mb property, should it not utilize the entire memory available to it?? Why do we manually have to set this.

Thanks,

Darshan

If not set explicitly, it’s set to a default value which may not be sufficient.