Hadoop exceeding virtual memory limits

Hey,
I am noticing that my hadoop task takes too much virtual memory when I try to load files into Druid using a hadoop cluster.

“Container [pid=7329,containerID=container_e156_1536304706671_58132_01_000017] is running beyond virtual memory limits. Current usage: 503.3 MB of 3 GB physical memory used; 32.8 GB of 6.3 GB virtual memory used. Killing container.”

My ingestion task is as follows:

{

“type” : “index_hadoop”,

“spec” : {

"dataSchema" : {

  "dataSource" : "wikipedia",

  "parser" : {

    "type" : "hadoopyString",

“metricsSpec”: [{

“type”: “count”,

“name”: “count”

}

            ],

    "parseSpec" : {

      "format" : "json",

“dimensionsSpec”: {

            "dimensions": [

              "crid"

            ]

          },

“timestampSpec”: {

            "column": "time_stamp",

            "format": "yyyy-MM-dd HH:mm:ss"

          }

    }

  },

  "granularitySpec" : {

    "type" : "uniform",

    "segmentGranularity" : "HOUR",

    "queryGranularity" : "none",

    "intervals" : ["2018-11-17T00:00:00.000Z/2018-11-18T00:00:00.000Z"],

    "rollup" : true

  }

},

"ioConfig" : {

  "type" : "hadoop",

  "inputSpec" : {

    "type" : "static",

    "paths" : "/user/data/mydata.gz"

  }

},

"tuningConfig" : {

  "type" : "hadoop",

  "leaveIntermediate" : "true",

  "partitionsSpec" : {

    "type" : "hashed",

    "targetPartitionSize" : 5000000

  },

  "jobProperties" : {

“mapreduce.job.user.classpath.first” : “true”,

    "mapreduce.map.java.opts" : "-Duser.timezone=UTC -Dfile.encoding=UTF-8",

    "mapreduce.reduce.java.opts" : "-Duser.timezone=UTC -Dfile.encoding=UTF-8",

    "mapreduce.map.memory.mb" : 1024,

    "mapreduce.reduce.memory.mb" : 1024

  }

}

},

“hadoopDependencyCoordinates”: [“org.apache.hadoop:hadoop-client:2.7.3”]

}

I cannot tamper with any configuration in the hadoop cluster.

Hadoop Error Log is as follows:

Container [pid=6763,containerID=container_e156_1536304706671_58132_01_000015] is running beyond virtual memory limits. Current usage: 120.5 MB of 3 GB physical memory used; 32.7 GB of 6.3 GB virtual memory used. Killing container. Dump of the process-tree for container_e156_1536304706671_58132_01_000015 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 6763 6761 6763 6763 (bash) 3 3 11489280 711 /bin/bash -c /opt/java/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/data4/yarn-nm-local-dir/usercache/pathikrit.g/appcache/application_1536304706671_58132/container_e156_1536304706671_58132_01_000015/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop/yarn-containers/application_1536304706671_58132/container_e156_1536304706671_58132_01_000015 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 172.16.201.53 40957 attempt_1536304706671_58132_m_000000_0 171523813933071 1>/var/log/hadoop/yarn-containers/application_1536304706671_58132/container_e156_1536304706671_58132_01_000015/stdout 2>/var/log/hadoop/yarn-containers/application_1536304706671_58132/container_e156_1536304706671_58132_01_000015/stderr |- 6972 6763 6763 6763 (java) 131 10 35102199808 30129 /opt/java/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/data4/yarn-nm-local-dir/usercache/pathikrit.g/appcache/application_1536304706671_58132/container_e156_1536304706671_58132_01_000015/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop/yarn-containers/application_1536304706671_58132/container_e156_1536304706671_58132_01_000015 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 172.16.201.53 40957 attempt_1536304706671_58132_m_000000_0 171523813933071 Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143

I am not able to figure out why so much of virtual memory is being used. I tried reducing the size of the data that goes into hadoop from a data source, but it still doesnt help. I have also tries imcreasong the mapreduce.map.memory.mb to 8196, but it still doesnt help.

Can anyone suggest reasons as to why I am getting this error, and how can I fix this situation. ** I will be glad to provide you with any further detail that you may require. A quick help will be greatly appreciated.**