Historical heap size config for data server

I would like to know about heap config in data server historical.
Server details for data server: 2 data servers of type: i3.2xlarge.
As per druid site, max possible heap is 24 Gb. I have about 50GB of data size on druid after all the tables loaded.
Also please note that there is no stream ingestion and only batch ingestion.

(1) So, if I set heap to 24GB, does that mean that 24GB of segments are loaded to heap from deep storage(s3)?
(2) What happens to remaining 26GB (50-24)? Are they remaining in S3 or are loaded to SSD of i3 instance?
(3) Is it possible to load all 50GB segments to in-memory so my query is fast(sub-second)? What should be the configuration for this?
(4) Should I also configure the direct memory for loading segments to in-memory?

Thanks in advance for the help.

i3.2xlarge has 61GB RAM and 8 vcpu. The heap is not used to load segments. It is used for computation. The direct memory is used for aggregation buffers. Total RAM-heap-direct memory=available page cache for memory mapping segments. In your case I would set heap to 4 GB (0.5 GB per vcpu) and direct memory to 13 GB (setting buffer size to 1 Gb and number of merge buffers to 4) With this you will have 61-17=44 GB available for the memory mapping of segments. If you have total of 50 GB data then this should be enough.

thanks alot @Vijay_Narayanan1 .

Also, if set the replication size to 1 will that allow more data to memory? Will that be a good strategy considering the fact that it is only batch load at the moment and losing one instance will have less of an impact?

I would recommend sticking to 2 replication. Replication also affects how much work is done by the servers in answering queries.

1 Like

One last question related to page cache you mentioned above.

Say there is not enough RAM memory to do mmap. So, OS kernel will load some data to OS cache and when a particular data is not in cache it will result a page fault and new required data will be loaded from disk to OS cache. Is this understanding, correct?

if there is not enough RAM to memory map all segments then the segments will be paged from disk

1 Like

sorry to ask multiple questions on this but technically a segment in druid can’t be larger than 2Gb because of this limitation mentioned here for mmap? [JDK-6347833] (fs) Enhance MappedByteBuffer to support sizes >2GB on 64 bit platforms - Java Bug System

No. The buffer has nothing to do with segment size. Buffers are used to store intermediate aggregates not segments.