historical process exit with no error or exception

Dear all,

The historical process in my cluster always exit with no error or exception. I didn’t do any query or other operation.

The configuration of historical process is:

-server

-Xms64m

-Xmx64m

-XX:MaxDirectMemorySize=6144m

-Duser.timezone=UTC+8

-Dfile.encoding=UTF-8

-Djava.io.tmpdir=var/tmp

-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

The druid version is 0.9.1.1.

The server machine memory info is (MB):

total used free shared buffers cached

Mem: 7872 2752 5120 0 79 230

-/+ buffers/cache: 2442 5429

Swap: 0 0 0

Do I need to increase the -Xms and -Xmx parameter?

Thanks.

Yufeng Wang

historical.log.gz (293 KB)

That looks like somebody is killing the historical process. I’d suspect either the linux OOM killer or something else automatically killing the JVM for some reason.
If on Linux, /var/log/syslog might have some info if its OOM Killer.

Hi Yufeng,

Your Xms and Xmx settings are indeed very small and it is highly likely that you need to increase them. Here is a sample production configuration: http://druid.io/docs/latest/configuration/production-cluster.html

Also, I highly recommend that you add something like the following to your jvm.config:

-XX:+UseConcMarkSweepGC

-XX:+PrintGCDetails

-XX:+PrintGCDateStamps

-XX:+UseGCLogFileRotation

-XX:NumberOfGCLogFiles=5

-XX:GCLogFileSize=500m

-XX:+HeapDumpOnOutOfMemoryError

-XX:+CrashOnOutOfMemoryError

-XX:ErrorFile=/var/log/services/druid/historical_pid_%p.log

-Xloggc:/var/log/services/druid/historical_gc.log

That should give you much more insight into what is happening. Also check how many segments are being allocated to your historical nodes. Linux has a default limit of 65536 memory mapped areas, so if you have many small segments you could hit the limit.

Regards,

–Ben