Index task for Kafka with start of stream fails with "java.lang.OutOfMemoryError: Map failed"

We are benchmarking Druid and triggered a Kafka drain (indexing from a topic from the beginning) to see how soon it indexes and catches up to real time. This index task was created with HOUR granularity. In the first run, it had created like 8 segments and failed with the following error:

2020-04-01T14:41:13,729 INFO [[index_kafka_test_topic_backfill_22c8d9889b2cd74_cliffeef]-appenderator-persist] org.apache.druid.segment.realtime.appenderator.AppenderatorImpl - Flushed in-memory data for segment[test_topic_backfill_2020-03-27T18:00:00.000Z_2020-03-27T19:00:00.000Z_2020-04-01T12:35:18.978Z_1] spill[6238] to disk in [6] ms (123 rows).

2020-04-01T14:41:13,728 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Encountered exception in run() before persisting.

java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: Map failed

at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.runInternal(SeekableStreamIndexTaskRunner.java:624) [druid-indexing-service-0.17.0.jar:0.17.0]

at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.run(SeekableStreamIndexTaskRunner.java:278) [druid-indexing-service-0.17.0.jar:0.17.0]

at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTask.run(SeekableStreamIndexTask.java:164) [druid-indexing-service-0.17.0.jar:0.17.0]

at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:419) [druid-indexing-service-0.17.0.jar:0.17.0]

at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:391) [druid-indexing-service-0.17.0.jar:0.17.0]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_172]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_172]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_172]

at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172]

Caused by: java.lang.RuntimeException: java.io.IOException: Map failed

at org.apache.druid.segment.realtime.appenderator.AppenderatorImpl.persistHydrant(AppenderatorImpl.java:1360) ~[druid-server-0.17.0.jar:0.17.0]

at org.apache.druid.segment.realtime.appenderator.AppenderatorImpl.access$100(AppenderatorImpl.java:103) ~[druid-server-0.17.0.jar:0.17.0]

at org.apache.druid.segment.realtime.appenderator.AppenderatorImpl$3.call(AppenderatorImpl.java:544) ~[druid-server-0.17.0.jar:0.17.0]

… 4 more

Caused by: java.io.IOException: Map failed

at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940) ~[?:1.8.0_172]

at com.google.common.io.Files.map(Files.java:864) ~[guava-16.0.1.jar:?]

at com.google.common.io.Files.map(Files.java:851) ~[guava-16.0.1.jar:?]

at com.google.common.io.Files.map(Files.java:818) ~[guava-16.0.1.jar:?]

at com.google.common.io.Files.map(Files.java:790) ~[guava-16.0.1.jar:?]

at org.apache.druid.java.util.common.io.smoosh.SmooshedFileMapper.mapFile(SmooshedFileMapper.java:132) ~[druid-core-0.17.0.jar:0.17.0]

at org.apache.druid.segment.IndexIO$V9IndexLoader.load(IndexIO.java:537) ~[druid-processing-0.17.0.jar:0.17.0]

at org.apache.druid.segment.IndexIO.loadIndex(IndexIO.java:194) ~[druid-processing-0.17.0.jar:0.17.0]

at org.apache.druid.segment.IndexIO.loadIndex(IndexIO.java:185) ~[druid-processing-0.17.0.jar:0.17.0]

at org.apache.druid.segment.realtime.appenderator.AppenderatorImpl.persistHydrant(AppenderatorImpl.java:1348) ~[druid-server-0.17.0.jar:0.17.0]

at org.apache.druid.segment.realtime.appenderator.AppenderatorImpl.access$100(AppenderatorImpl.java:103) ~[druid-server-0.17.0.jar:0.17.0]

at org.apache.druid.segment.realtime.appenderator.AppenderatorImpl$3.call(AppenderatorImpl.java:544) ~[druid-server-0.17.0.jar:0.17.0]

… 4 more

Caused by: java.lang.OutOfMemoryError: Map failed

at sun.nio.ch.FileChannelImpl.map0(Native Method) ~[?:1.8.0_172]

at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:937) ~[?:1.8.0_172]

at com.google.common.io.Files.map(Files.java:864) ~[guava-16.0.1.jar:?]

at com.google.common.io.Files.map(Files.java:851) ~[guava-16.0.1.jar:?]

at com.google.common.io.Files.map(Files.java:818) ~[guava-16.0.1.jar:?]

at com.google.common.io.Files.map(Files.java:790) ~[guava-16.0.1.jar:?]

at org.apache.druid.java.util.common.io.smoosh.SmooshedFileMapper.mapFile(SmooshedFileMapper.java:132) ~[druid-core-0.17.0.jar:0.17.0]

at org.apache.druid.segment.IndexIO$V9IndexLoader.load(IndexIO.java:537) ~[druid-processing-0.17.0.jar:0.17.0]

at org.apache.druid.segment.IndexIO.loadIndex(IndexIO.java:194) ~[druid-processing-0.17.0.jar:0.17.0]

at org.apache.druid.segment.IndexIO.loadIndex(IndexIO.java:185) ~[druid-processing-0.17.0.jar:0.17.0]

at org.apache.druid.segment.realtime.appenderator.AppenderatorImpl.persistHydrant(AppenderatorImpl.java:1348) ~[druid-server-0.17.0.jar:0.17.0]

at org.apache.druid.segment.realtime.appenderator.AppenderatorImpl.access$100(AppenderatorImpl.java:103) ~[druid-server-0.17.0.jar:0.17.0]

at org.apache.druid.segment.realtime.appenderator.AppenderatorImpl$3.call(AppenderatorImpl.java:544) ~[druid-server-0.17.0.jar:0.17.0]

… 4 more

2020-04-01T14:41:15,959 ERROR [[index_kafka_test_topic_backfill_22c8d9889b2cd74_cliffeef]-appenderator-persist] org.apache.druid.segment.realtime.appenderator.AppenderatorImpl - Incremental persist failed: {class=org.apache.druid.segment.realtime.appenderator.AppenderatorImpl, segment=test_topic_backfill_2020-03-27T18:00:00.000Z_2020-03-27T19:00:00.000Z_2020-04-01T12:35:18.978Z_1, dataSource=test_topic_backfill, count=6238}

2020-04-01T14:41:15,959 ERROR [[index_kafka_test_topic_backfill_22c8d9889b2cd74_cliffeef]-appenderator-persist] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Persist failed, dying

2020-04-01T14:41:15,965 INFO [[index_kafka_test_topic_backfill_22c8d9889b2cd74_cliffeef]-appenderator-persist] org.apache.druid.segment.realtime.appenderator.AppenderatorImpl - Flushed in-memory data for segment[test_topic_backfill_2020-03-27T18:00:00.000Z_2020-03-27T19:00:00.000Z_2020-04-01T12:35:18.978Z_1] spill[6238] to disk in [5] ms (123 rows).

2020-04-01T14:41:17,918 ERROR [[index_kafka_test_topic_backfill_22c8d9889b2cd74_cliffeef]-appenderator-persist] org.apache.druid.segment.realtime.appenderator.AppenderatorImpl - Incremental persist failed: {class=org.apache.druid.segment.realtime.appenderator.AppenderatorImpl, segment=test_topic_backfill_2020-03-27T18:00:00.000Z_2020-03-27T19:00:00.000Z_2020-04-01T12:35:18.978Z_1, dataSource=test_topic_backfill, count=6238}

Java HotSpot™ 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f2a2a793000, 262144, 0) failed; error=‘Cannot allocate memory’ (errno=12)

Hi Siva Prasanna,
Can you post your kafka ingestion spec?

Want to see whether we can make any changes in the spec to make it work?

For my understanding , when you say from the beginning, what is the duration of the data in kafka topic? Is it one month or 3 months or 6 months?

What is the total size of the entire data in kafka topic you are trying to ingest ?

Can you also give some details about your druid cluster as well? (how many master nodes, data nodes and what is their config in terms of CPU, RAM, storage etc)

Thanks,

–siva

Hi Siva, Please find the ingestion spec below.

And my cluster set up is:

4 Data Servers (Historical/MiddleManager) - 16GB RAM, 6 core,

1 Coordinator-Overlord node - 16GB, 8 core

1 Broker-Router node - 16GB, 8 Core

JVM config:

Historical : 3GB Max heap, 10GB MaxDirectMemory

MiddleManager : 128MB Heap

Task Configuration : 1GB Heap, 4GB MaxDirectMemory, numMergeBuffers=2, numThreads=2, sizeBytes=256000000

My topic has about 3-4 days of data. The ingestion rate was ~500 records per sec with 8 fields each. The fields are not that big. They are all string fields with simple values.

Ingestion spec:

{

“type”: “kafka”,

“ioConfig”: {

“type”: “kafka”,

“consumerProperties”: {

“bootstrap.servers”: “<BOOTSTRAP_SERVERS>”

},

“topic”: “druid_ingestion_poc”,

“inputFormat”: {

“type”: “json”

}

},

“tuningConfig”: {

“type”: “kafka”

},

“dataSchema”: {

“dataSource”: “hour_level_druid_ingestion_poc”,

“granularitySpec”: {

“type”: “uniform”,

“queryGranularity”: “HOUR”,

“segmentGranularity”: “HOUR”,

“rollup”: true

},

“timestampSpec”: {

“column”: “event_time”,

“format”: “millis”

},

“dimensionsSpec”: {

“dimensions”: [

“dim_1”,

“dim_2”,

“dim_3”,

“dim_3”,

“dim_4”,

“dim_5”

]

},

“metricsSpec”: [

{

“name”: “m_1”,

“type”: “thetaSketch”,

“fieldName”: “d_1”

},

{

“name”: “m_2”,

“type”: “thetaSketch”,

“fieldName”: “d_2”

}

]

}

}

``

Hi Siva Prasanna,
As per https://druid.apache.org/docs/latest/development/extensions-core/kafka-ingestion.html, sample spec contains

Hi Siva.

If we don’t set taskDuration, it uses PT1H by default which should be fine for our load. And regarding maxRowsPerSegment, we don’t actually reach that level. We are running our benchmark with approximately 500 events per sec and we hardly get 200,000 rows per segment normally so reducing from the default 5,000,000 to 1,000,000 won’t be of help, I believe.

Thanks. :slight_smile: