Historical JVM configuration - Help urgently required

Hi all,

The persistent OOM errors that I have been seeing in my staging environment have now presented in the production environment as well and I urgently need some help to resolve them.

I have been playing around unsuccessfully with the jvm parameters, but I always end up with OOM conditions. My Historical nodes are not even taking any queries, they start up, load their segments, announce their segments, get an OOM error and restart.

I am trying to run them in a container with 42g of RAM, and it does not matter if I set Xmx + MaxDirectMemorySize + druid.cache.sizeInBytes to something well below capacity of the container - I still get OOM. I have even tried running this same configuration (below) in a container with 84g RAM and still have the same issue. I have also tried to set Xss and MaxMetaspaceSize to see if that resolves the problem and it does not.

-server

-Xms16g

-Xmx16g

-Xmn5g

-Xss1024k

-XX:MaxMetaspaceSize=4g

-XX:MaxDirectMemorySize=16g

-XX:+UseConcMarkSweepGC

-XX:+PrintGCDetails

-XX:+PrintGCDateStamps

-Xloggc:/var/log/services/druid/historical_gc.log

-XX:+UseGCLogFileRotation

-XX:NumberOfGCLogFiles=10

-XX:GCLogFileSize=500m

-Duser.timezone=UTC

-Dfile.encoding=UTF-8

-Djava.io.tmpdir=/services/druid/data/tmp

-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

-Dservicename=historical

-XX:+HeapDumpOnOutOfMemoryError

-XX:+CrashOnOutOfMemoryError

-XX:ErrorFile=/var/log/services/druid/historical_pid_%p.log

From the error file:

If you are on Java 8, then I would recommend switching to G1 GC.

Also, we’ve found that the default 1 gb buffer size is way more than we ever need, and shrinking it can help with overall performance. This may not be true for you if you do very large groupby or topn.

Good luck.

Hi Max,

Thanks for your tips. I will try them, but I think there is something fundamentally wrong with my configuration. The host is not running out of memory. The Historical node is just failing to run within the constraints set on the JVM and does so no matter how much memory I give it. There is no way it should run fine for months on a 42g RAM container and from one day to the next cannot even be started even on an 84g RAM container.

–Ben