Native memory allocation (mmap) failed to map 65536 bytes for committing reserved memory

Hi,

I am getting below error when trying to ingest around 137 GB file into druid.

2023-01-09T15:23:22,479 INFO [task-runner-0-priority-0] org.apache.druid.segment.loading.SegmentLocalCacheManager - Using storage location strategy: [LeastBytesUsedStorageLocationSelectorStrategy]
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f9b06001000, 65536, 1) failed; error='Not enough space' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 65536 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /root/druid-setup/apache-druid-24.0.0/hs_err_pid23128.log
#
# Compiler replay data is saved as:
# /root/druid-setup/apache-druid-24.0.0/replay_pid23128.log

Please could someone help here?

Is this a single 137 GB file? If so, that would mean that a single thread of ingestion will need to process it. You should split it up into many small files which will also allow you to ingest it in parallel.

Thanks. It is not single file. 137 GB is broken down into many small files (each 30 Mb) in size.

Also i had a look at this article: https://support.imply.io/hc/en-us/articles/360060443474-Historical-will-not-start-due-to-OOM-and-failed-to-map-errors-though-there-is-enough-heap-available-and-ulimit-is-set-very-high

Here it says to increase max_map_count. I didn’t quite understand this. Does this mean that the error Native memory allocation (mmap) failed to map 65536 bytes for committing reserved memory is indication that there are more than 65536 segments created and this cause the JVM to fail?

My understanding is that segment size is generally ~ 600-800 MB. So how can there be more than 65536 segments memory mapped?

Let’s take a step back.
Can you share the ingestion spec? Which log shows that error? What does your cluster look like for each process type? “Not enough space” sounds like a disk space issue.

Thanks @Sergio_Ferragut .

Please find below my response to your questions:

  1. Can you share the ingestion spec?
    We are using Multi Stage Ingestion using SQL. Please find the query below. Also note that the number of tasks used is 7 while running the query.
REPLACE INTO UNSPC_TAXONOMY_fact_extrapo_zip OVERWRITE ALL
WITH ext AS (SELECT *
FROM TABLE(
  EXTERN(
    '{"type":"s3","prefixes":["s3://data-dev/druid_poc_1/denormalised_druid_data/UNSPC_TAXONOMY_fact_extrapo_zip/"]}',
    '{"type":"csv","findColumnsFromHeader":true}',
    '[{"name":"product_id","type":"long"},{"name":"unspsc_1","type":"string"},{"name":"unspsc_2","type":"string"},{"name":"unspsc_3","type":"string"},{"name":"unspsc_4","type":"string"},{"name":"taxonomy_id","type":"long"},{"name":"manf_desc","type":"string"},{"name":"prod_desc","type":"string"},{"name":"sku","type":"string"},{"name":"state_id","type":"long"},{"name":"state_name","type":"string"},{"name":"state_abbr","type":"string"},{"name":"location_id","type":"long"},{"name":"zip_code","type":"long"},{"name":"monthenddatekey","type":"string"},{"name":"sales_year","type":"long"},{"name":"sales_month","type":"long"},{"name":"sales_quarter","type":"long"},{"name":"dist_total_revenue","type":"long"},{"name":"dist_total_units","type":"long"},{"name":"modified_date","type":"string"},{"name":"facility_type","type":"string"}]'
  )
))
SELECT
  TIME_PARSE(monthenddatekey) AS __time,
  product_id,
  unspsc_1,
  unspsc_2,
  unspsc_3,
  unspsc_4,
  taxonomy_id,
  manf_desc,
  prod_desc,
  sku,
  state_id,
  state_name,
  state_abbr,
  location_id,
  zip_code,
  sales_year,
  sales_month,
  sales_quarter,
  dist_total_revenue,
  dist_total_units,
  modified_date,
  facility_type
FROM ext
PARTITIONED BY DAY
  1. Which log shows that error?

Middle Manager

  1. What does your cluster look like for each process type?

image

Also EBS volume mount on each of data node is 2TB

Additional Info on JVM config:

Historicals:

-server

-Xms4g

-Xmx4g

-XX:MaxDirectMemorySize=6g

-XX:+ExitOnOutOfMemoryError

-Duser.timezone=UTC

-Dfile.encoding=UTF-8

-Djava.io.tmpdir=var/tmp

-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

Indexer

-server

-Xms4g

-Xmx4g

-XX:MaxDirectMemorySize=4g

-XX:+ExitOnOutOfMemoryError

-Duser.timezone=UTC

-Dfile.encoding=UTF-8

-Djava.io.tmpdir=var/tmp

-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

how many segments do you have in the historicals when you run the ingestion?

roughly 1200 ( which includes all the existing data sources)

what is your max_map_count?

if you have 1200 segments and replication of 2 then total 2400 segments. With 8 servers this should be about 300 segments per server. Your max_map_count should be larger than this

Hi @Vijay_Narayanan1 max_map_count on Data node is 65530.

Also number of data nodes are 2. I will just post the cluster config here for reference:
image

one possibility is that the ingestion is creating so many segments that you are exceeding the 65530 limit on the historicals (the task won’t succeed until the segments are in the historical. The number of segments that tasks create depends on how the data in the files are organised. Can you try increasing the max_map_count?

Sure i will contact DevOps to increase max_map_count. Please suggest what should be the new number.

Regarding segment creation, what i thought is each of the segment will be created once the size reaches approax 500-800MB. By that logic 137 GB can’t go beyond 65530 limit. These are CSV files. So what causes this tremendous increase in segments?

the logic depends on data organization in the files. Let us say you have daily segement granularity and you have 10 days in each file . Then for each input file the task will create 10 segments. This will result in a huge number of segments

ok and will there be a compaction happening after the segments are committed? Because some of the similar type of files create max 200 segments for the given data source.

you should enable auto compaction