How much RAM do historical nodes require for segments mapping to memory?

We have nearly 200GB worth of loaded segments for our current retention policy which is set to 1 month. I mean we always need to do our queries among last month’s segments which are around 200GB.
I have two historical nodes, 64GB of RAM, with current set-up:
xms → 8GB
XMS → 12GB
MaxDirectMemorySize → 16G

and…

druid.processing.buffer.sizeBytes=512MiB
druid.processing.numMergeBuffers=3
druid.processing.numThreads=14

So technically the rest of RAMs would go for memory mapping of segments. But historical nodes constantly crash, after a short period of time, specifically near the loading of the latest segments. The error is insufficient memory (famous OutOfMemory)

My question is, since we always have 200GB worth of data, my machines’ RAM is not enough?
This is because when I lower the retention period to a few days, and the size of segments for those a few days gets reduced to 21GB, the historical nodes become stable.

0.20

How many segments are there? Perhaps you are running into issues that are faced by people who have lots of very small segments – this would also affect the coordinator and broker which need more memory the more segments that you have.

As the recommended size is between 300-700MB for 5 million rows, I think that means you should have about 400?

There are nearly 70K segments.
I also have increased my map of linux to 250K.
Average segment size is 6.8.
Number of rows are 8 billions more or less.

Oh wow yes I would immediately recommend either a manual compaction job or to set up automatic compaction. Even with 8 billion rows, you should have only 1600 segments.

Thanks really!
And you’re telling me that the system with 64GB of RAM (and two historicals make up 128GB) is good? And compaction seems a solving addition.

I am definitely going on gut here rather than data – but I suspect (!) this is a segment numbers versus segment volume issue… especially as you have managed to calm it down by reducing the date range you make available.

Something here may help you a little also:

You could use SYS tables to profile your segment timeline a little: there’s some examples here:

thank you very much!