Hello,
The historical service restarts at regular intervals and the logs on historicals about out of memory but I wasn’t sure on what was running out of memory.
Hello,
The historical service restarts at regular intervals and the logs on historicals about out of memory but I wasn’t sure on what was running out of memory.
Hi Naveen,
It’s probably because your -Xmx and -XX:MaxDirectMemorySize configs, added together, is more memory than your server has. Try reducing them to fit.
We are running on 64 GB machine and these are the properties we set for historical
class { ‘::druid::historical’ :
java_xms => ‘2g’,
java_xmx => ‘10g’,
XX:MaxDirectMemorySize => ‘10g’,
java_opts => ’ -XX:NewSize=2g -XX:MaxNewSize=2g -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps’,
port => ‘8080’,
proc_buffer_size => ‘650M’,
proc_num_threads => ‘7’,
proc_num_mbuf => ‘2’,
http_threads => ‘50’,
caches => [
[ ‘/data/druid/index/historical’, ‘100000000000’ ],
],
}
``
I run the top command on the host and it still has 12 GB ram available.
Bump,
Can anyone look into this and give their views
Thanks
Hi Naveen,
MaxDirectMemorySize should be at least = druid.processing.buffer.sizeBytes * (druid.processing.numMergeBuffers + druid.processing.numThreads + 1).
See: http://druid.io/docs/latest/configuration/index.html#historical
proc_buffer_size => ‘650M’,
proc_num_threads => ‘7’,
proc_num_mbuf => ‘2’,
http_threads => ‘50’,
If I read it correctly from the properties you shared, MaxDirectMemorySize = 650M * (2 + 7 + 1 ) = 6500M = 6.5G
Can you reduce the XX:MaxDirectMemorySize and Xmx values and try ?
Thanks,
Sashi
We have updated the configs recently but still the error persists:
Updated configs :
Host1:
druid.processing.buffer.sizeBytes=680MB
druid.processing.numThreads=15
druid.processing.numMergeBuffers=4
MaxDirectMemorySize = 20gb
XMX = 6gb
Total ram on host = 47GB
``
Host2:
druid.processing.buffer.sizeBytes=680MB
druid.processing.numThreads=31
druid.processing.numMergeBuffers=8
MaxDirectMemorySize = 36gb
XMX = 12gb
Total ram on host = 700GB
`
`
Both the hosts are giving us the same errors. My guess is that when a thread tries to load a humongous number of segments like 40k, the thread is running out of memory and causing the service to restart periodically.
If you’re getting this error vs. a message that says you’re out of JVM heap or direct memory:
it could indicate that periodically, the memory usage of Druid + non-Druid processes on that machine is exceeding the total available memory (i.e., Druid may be within its configured limits but those configured limits are too high given other load on the system).
Try reducing the memory usage of the Druid size, and/or look into what else is consuming RAM on that system.
Thanks, Jon. That’s something I am just trying. I’ll keep this thread updated if the error persists.
Thanks