How to cache all historical data in memory

I would want to fit all historical data into the cache

How big should the cache size be to fit 1 GB of physical segments?

if I set “druid.cache.sizeInBytes” big enough, there won’t be any reading requests to S3 because it all fits into cache?

if I disable “druid.historical.cache.useCache” then Broker query would trigger s3 request?

Hey Shilpa,

The druid.cache* configuration options refer to a query results cache, so this won’t influence whether segments are stored in memory or not.

When an historical starts up it loads segments from deep storage (s3 in your case) onto its disk and mmaps those files.

You can configure the maximum total size of segments assigned to an historical through the “druid.server.maxSize” parameter. If you set this below the amount of free memory on the machine, the segments will remain in memory. If the size of all segments assigned to an historical is greater than the amount of memory available queries will result in disk io as the kernel shifts pages between disk and memory.

Best regards,

Dylan

Thank you Dylan Wylie!!!

is the “druid.server.maxSize” set based on disk size or memory?

druid.server.maxSize refers to the amount of disk space you want to make available for segments on that historical.

One concept you mentioned that I think is worth adjusting - queries will never trigger a reading request from deep storage (S3). Queries can only serve data that has already been loaded from deep storage into the historical’s segment cache (which is influenced by druid.server.maxSize, as this is the value reported to the coordinator to be the maximum amount of segments that should be assigned to this historical). Hence, you should make sure that the total of druid.server.maxSize across all of your historical nodes is greater than the amount of segment data you wish to be available for queries.

Hope this helps!

David

What does control whether the segments load to memory or not? I am having trouble with our historical nodes not holding them in memory even though there is available memory and over 40 GB of segments to host.

Thanks in advance,

Josh