[druid-user] Historical Memory issue

Hi team,

I am facing issue with Historical process filling up /buffer/cache on EC2 with growing number of segments.

Here is my config:-
druid.service=druid/historical
druid.plaintextPort=8083

HTTP server threads

druid.server.http.numThreads=100

Processing threads and buffers

druid.processing.buffer.sizeBytes=500000000
druid.processing.numMergeBuffers=4
druid.processing.numThreads=15
druid.processing.tmpDir=/mnt/disk2/var/druid/processing
druid.segmentCache.numLoadingThreads=50

Segment storage

druid.segmentCache.locations=[{“path”:"/mnt/disk2/var/druid/druidSegments", “maxSize”: 6500000000000},{“path”:"/mnt/disk3/var/druid/druidSegments", “maxSize”: 6500000000000},{“path”:"/mnt/disk4/var/druid/druidSegments", “maxSize”: 6500000000000},{“path”:"/mnt/disk5/var/druid/druidSegments", “maxSize”: 6500000000000},{“path”:"/mnt/disk6/var/druid/druidSegments", “maxSize”: 6500000000000},{“path”:"/mnt/disk7/var/druid/druidSegments", “maxSize”: 6500000000000},{“path”:"/mnt/disk8/var/druid/druidSegments", “maxSize”: 6500000000000},{“path”:"/mnt/disk9/var/druid/druidSegments", “maxSize”: 6500000000000}]
druid.server.maxSize=50000000000000

Query cache

druid.historical.cache.useCache=true
druid.historical.cache.populateCache=true
druid.cache.type=memcached
druid.cache.sizeInBytes=256000000
druid.cache.hosts=:11211

but when I am looking at free memory:-
free -m -h
total used free shared buff/cache available
Mem: 747G 21G 709G 868K 17G 722G
Swap: 0B 0B 0B

The buff/cache value is keep growing, and after certain period it fills up my whole memory and historical server fails.

Can someone help what am I missing here?

Thanks,

can you get the error from historical log? Druid does not load segments into memory until there is a query. Segments are memory mapped and loaded into page cache when executing a query. Only the segments accessed by the query get loaded into page cache

After certain segments load like around 65k+, historical thread fail saying not enough memory for JVM to run

I am observing with new historical nodes the buff/cache size is keep on increasing.
total used free shared buff/cache available
Mem: 747G 22G 706G 868K 19G 721G
Swap: 0B 0B 0B

you need to set a larger heap. What is your current jvm heap?

Its 30 gb, I doubt its the heap issue, it should have failed earlier… and still why buff/cache is filling up ?

65k+ seems like it is reaching the limit on open files. What is your ulimit on open files?. Set the ulimit on open files to a higher value. Also once the historicals run a compaction task to get segments size to around 500MB

vijay

core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 30446
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 65535
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited

file locks (-x) unlimited

when I saw the failed intance, buff/cache was 710 GB and thats the failure

I am not so sure…you are open files limit is at 65535. That means that when 65k segments are loaded this limit will be reached and the historical will go down. You want to make open files to unlimited and try

After the research I am also thinking the same buffer cache should not be the cause, I’ve increased the nofile limit to 200k, And ingestion is running, will see how that goes.

I am getting this error:-
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f2f9ec20000, 65536, 1) failed; error=‘Cannot allocate memory’ (errno=12)

this can happen if you have exceeded the max memory mapped files (look at https://druid.apache.org/docs/latest/operations/basic-cluster-tuning.html#system-configuration)

Increase the memory files to a larger number

Thanks, yeah I got the same done from this post and java - Native memory allocation (mmap) failed to map - Stack Overflow
And it seems to work now.