Historical service restarts at regular intervals

Hello,

The historical service restarts at regular intervals and the logs on historicals about out of memory but I wasn’t sure on what was running out of memory.

Hi Naveen,

It’s probably because your -Xmx and -XX:MaxDirectMemorySize configs, added together, is more memory than your server has. Try reducing them to fit.

We are running on 64 GB machine and these are the properties we set for historical

class { ‘::druid::historical’ :

java_xms => ‘2g’,

java_xmx => ‘10g’,

XX:MaxDirectMemorySize => ‘10g’,

java_opts => ’ -XX:NewSize=2g -XX:MaxNewSize=2g -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps’,

port => ‘8080’,

proc_buffer_size => ‘650M’,

proc_num_threads => ‘7’,

proc_num_mbuf => ‘2’,

http_threads => ‘50’,

caches => [

[ ‘/data/druid/index/historical’, ‘100000000000’ ],

],

}

``

I run the top command on the host and it still has 12 GB ram available.

Bump,

Can anyone look into this and give their views

Thanks

Hi Naveen,

MaxDirectMemorySize should be at least = druid.processing.buffer.sizeBytes * (druid.processing.numMergeBuffers + druid.processing.numThreads + 1).

See: http://druid.io/docs/latest/configuration/index.html#historical

proc_buffer_size => ‘650M’,

proc_num_threads => ‘7’,

proc_num_mbuf => ‘2’,

http_threads => ‘50’,

If I read it correctly from the properties you shared, MaxDirectMemorySize = 650M * (2 + 7 + 1 ) = 6500M = 6.5G

Can you reduce the XX:MaxDirectMemorySize and Xmx values and try ?

Thanks,

Sashi

We have updated the configs recently but still the error persists:

Updated configs :

Host1:

druid.processing.buffer.sizeBytes=680MB

druid.processing.numThreads=15

druid.processing.numMergeBuffers=4

MaxDirectMemorySize = 20gb

XMX = 6gb

Total ram on host = 47GB

``

Host2:

druid.processing.buffer.sizeBytes=680MB

druid.processing.numThreads=31

druid.processing.numMergeBuffers=8

MaxDirectMemorySize = 36gb

XMX = 12gb

Total ram on host = 700GB

`

`

Both the hosts are giving us the same errors. My guess is that when a thread tries to load a humongous number of segments like 40k, the thread is running out of memory and causing the service to restart periodically.

If you’re getting this error vs. a message that says you’re out of JVM heap or direct memory:

There is insufficient memory for the Java Runtime Environment to continue.

Native memory allocation (mmap) failed to map 12288 bytes for committing reserved memory.

Possible reasons:

The system is out of physical RAM or swap space

it could indicate that periodically, the memory usage of Druid + non-Druid processes on that machine is exceeding the total available memory (i.e., Druid may be within its configured limits but those configured limits are too high given other load on the system).

Try reducing the memory usage of the Druid size, and/or look into what else is consuming RAM on that system.

Thanks, Jon. That’s something I am just trying. I’ll keep this thread updated if the error persists.

Thanks