Historical node crashes when data is growing

Our historical node has the following setup:

Disk Space: > 2TB

Memory: 128 GB

The config looks like this:

druid.processing.buffer.sizeBytes=100000000

druid.processing.numThreads=4

druid.extensions.localRepository=/home/druid/.m2_hdfs/repository

druid.extensions.coordinates=[“io.druid.extensions:druid-examples”,“io.druid.extensions:druid-kafka-eight”,“io.druid.extensions:mysql-metadata-storage”,“io.druid.extensions:druid-hdfs-storage:0.8.0-rc1”]

druid.monitoring.monitors=[“com.metamx.metrics.JvmMonitor”]

Zookeeper

druid.zk.service.host=XXX.XXX.XXX.XXX

If you choose to compress ZK announcements, you must do so for every node type

druid.announcer.type=batch

druid.curator.compress=true

druid.discovery.curator.path=/hdfs/discovery

druid.segmentCache.locations=[{“path”: “/data/druid_hdfs/indexCache”, “maxSize”: 300000000000}]

druid.server.maxSize=300000000000

Metadata Storage (mysql)

druid.metadata.storage.type=mysql

druid.metadata.storage.connector.connectURI=jdbc:mysql://XXX.XXX.XXX.XXX:3306/druid_hdfs

druid.metadata.storage.connector.user=xxxxxxxxxx

druid.metadata.storage.connector.password=xxxxxxxxxxx

druid.storage.type=hdfs

druid.storage.storageDirectory=hdfs://xxxxxxx.xxxxx.xxxxx:8020/user/druid/data

Query Cache (we use a simple 10mb heap-based local cache on the broker)

druid.cache.type=local

druid.cache.sizeInBytes=10000000

druid.emitter=logging

druid.emitter.logging.logLevel=debug

I am starting the historical node with the following command:

nohup java -server -Xmx12g

-Xms12g

-XX:NewSize=6g

-XX:MaxNewSize=6g

-XX:MaxDirectMemorySize=32g

-XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Duser.timezone=UTC -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/tmp/druid_hdfs -classpath config/_common:config/historical:lib/*:hadoop classpath io.druid.cli.Main server historical

Everything worked fine for over a year. But now as data grew and the files in my druid.segmentCache.locations became bigger then 115GB, the server was not able to load any segments anymore and even crashed with messages like:

There is insufficient memory for the Java Runtime Environment to continue.

Native memory allocation (malloc) failed to allocate 28520448 bytes for committing reserved memory.

An error report file with more information is saved as:

/usr/local/druid-0.8.0-rc1/hs_err_pid25770.log

Sometimes it throws just exceptions like:

13:50:57.264 [ZkCoordinator-0] ERROR io.druid.server.coordination.ZkCoordinator - Failed to load segment for dataSource: {class=io.druid.server.coordination.ZkCoordinator, exceptionType=class io.druid.segment.loading.SegmentLoadingException, exceptionMessage=Exception loading segment[ …

io.druid.segment.loading.SegmentLoadingException: Exception loading segment[xxxxxxx

Caused by: io.druid.segment.loading.SegmentLoadingException: Error loading [hdfs://xxxxx.xxxx.xxxx

Caused by: java.io.IOException: No FileSystem for scheme: hdfs

But these hdfs exceptions seems to point in the wrong direction. I checked it manually with hadoop fs -ls and the files were there.

Regarding the memory output:

I still had around 22GB RAM free during that time. Restarting the historical server resulted in the server crashing again with the same messages. Sometimes the message is the memory issue, sometimes its the Exceptions regarding “Failed to load segment”

Now that I marked some old data as used=0 in coord database, the server works fine. But from the first question at http://druid.io/faq.html I read, that I should be able to have more data assigned to a historical node then memory available. I can live with the fact that old data will take much longer to query.

I am using druid 0.8

So what am I doing wrong in order to have more data on the historical node then memory available.

Thanks for your help.

Best regards

Roman

Hi Roman, this error points to the fact that the hdfs extension is not correctly set up. How have you included the HDFS extension and appropriate hadoop conf xml files in your classpath?

Hi,

thanks for your answer. I doubt this is a hdfs problem as this setup worked perfectly for over a year without changing anything. And usually its not the hdfs error that pops up, but the memory problem.

Can you confirm that I should be able to assign more data to a historical node then there is memory available? And what can be the reason that it is running out of memory when there is still 20GB left on the system.

Thanks a lot for your help.

Hi Roman, yes, u can definitely have more segments stored on disk than available memory.

For the error:
13:50:57.264 [ZkCoordinator-0] ERROR io.druid.server.coordination.ZkCoordinator - Failed to load segment for dataSource: {class=io.druid.server.coordination.ZkCoordinator, exceptionType=class io.druid.segment.loading.SegmentLoadingException, exceptionMessage=Exception loading segment[ …

io.druid.segment.loading.SegmentLoadingException: Exception loading segment[xxxxxxx

Caused by: io.druid.segment.loading.SegmentLoadingException: Error loading [hdfs://xxxxx.xxxx.xxxx

Caused by: java.io.IOException: No FileSystem for scheme: hdfs

That to me points that the historicals are not able to talk to HDFS and correctly download segments.

For the error:

There is insufficient memory for the Java Runtime Environment to continue.

Native memory allocation (malloc) failed to allocate 28520448 bytes for committing reserved memory.

An error report file with more information is saved as:

/usr/local/druid-0.8.0-rc1/hs_err_pid25770.log

You seem to have allocated more memory than what is available on your box. Are you running any other progresses on this box that may require a lot of memory?

Hi, no thats the strange thing. There was still more then 20GB free memory when it crashed again today. But after moving to 0.9.1.1 it seems to load a lot more data then memory available without crashing. So just an update seemed to solve the issue. Thanks anyway for your help.

Hi Fangjin,

I have the exact problem like Roman, but on the 0.9.1.1 version of Druid.

Suddenly the historical nodes are crashing because of out of memory errors. But there is always about 30GB of RAM free when they crash.

The configs for the historical nodes are as in Roman’s case.

Error:

OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f7db0acf000, 12288, 0) failed; error=‘Cannot allocate memory’ (errno=12)

Hi Dražen,

Check how many segments are allocated to your historicals. Linux has a default limit of 65536 memory mapped segments per process and if you exceed that you will get OOM errors. If this is the case most likely it would be beneficial to compact your segments so that they are in the 400-700MB range, alternatively you can increase the linux setting (vm.max_map_count).

Good luck,

–Ben

you may want to check file handle limit on your system.druid typically memory maps segment before loading.

Thank you, Ben, thank you!

I had 64894 segments to load, and the limit was as you had said 65536: 2017-01-18T08:48:46,934 INFO [main] io.druid.server.coordination.ZkCoordinator - Loading segment cache file [7095/64894]

But I guess that Druid is memory mapping something else besides the segments?

I changed the vm.max_map_count to a higher value, and now the historical nodes are not failing any more. So this did the trick.

p.s. I am going to have a serious talk with my sys admin, and ask him why he did not suggest this to me earlier. :smiley:

Thanks again,

Drazen

Dana utorak, 17. siječnja 2017. u 17:13:07 UTC+1, korisnik Ben Vogan napisao je:

Hi Team, We were facing same issue for one of our historical node and were not able to figure out what exactly is happening and what is going wrong. This post came quite handy and helpful to resolve the issue. thanks everyone.

Link on how to increase the number of memory mapped segments on Linux - https://stackoverflow.com/questions/11683850/how-much-memory-could-vm-use

Regards,

Arpan Khagram

+91 8308993200