Historical cluster Ram requirement estimation

Hi,

we are using druid 0.11 and are having 1.7 GB of segment data being added everyday to historical nodes.

What we noticed is that as we query larger span of data historical tries to read it from disk and takes much longer to come back with results. I am assuming that segments that are cached in historical come back much faster than on the disk.

We would like to check the possibility of keeping more data in ram, is there a way we can estimate the cluster’s ram requirement based on the segment size ?

Also would centralized cache be more efficient than having more historical nodes ?

Currently we have 5 * r4.2xlarge.

Thanks

ping

Hi Gaurav,

Apologies for the delay,

Historical servers segment cache is the disk: segments are pulled from deep storage and placed in an on disk segment cache. These files are then memory mapped, and the ‘pages’ of these files are cached by Linux in the ‘free’ memory. If the free space is large enough to store all of the segments assigned to a historical (e.g. bigger than druid.server.maxSize) then all of the data will be effectively in memory and avoid disk io. However, this isn’t often realistic, so some tuning might be in order to achieve the performance that your use case requires.

If you’re seeing your historical servers thrashing on disk, it means that the amount of data your queries are processing is larger than the amount of available free space available, causing page faults that pull the missing segment data from disk into the page cache. If the jvm can spare it, you might be able to shrink heap or direct memory allocation to the historical in order to give more room for page cache, otherwise adding additional servers is the best avenue to increase performance. The performance faq has additional details http://druid.io/docs/latest/operations/performance-faq.html. If you’re more interested in certain parts of your data performing better than others, another potential avenue could be ‘tiering’ your historicals using load/drop rules, which is somewhat described here http://druid.io/docs/latest/operations/rule-configuration.html.

Cache in the more traditional sense, like what I think you were thinking, is used exclusively for query results, and really only effective for repeated queries (but can be very effective if you have a ton of repeated queries). See http://druid.io/docs/latest/configuration/index.html#cache-configuration if your workload does include many repeated queries.

Hello Clint Wylie,

Thank you for your response, it is very helpful.

So I understand now that adding a cache will only help for repeated queries and we will look that option as well some time in future.

Coming back to Historical cluster estimation:

we are planning to use r4.2xlarge = 64 GB ram and 8 vCPU

so numThreads = vCPU - 1 ( for OS) = 7

Direct Memory size = druid.processing.buffer.sizeBytes * (druid.processing.numMergeBuffers + druid.processing.numThreads + 1) , comes to about 4.2 GB

Heap = 250mb * (processing.numThreads) = 1.75 GB

memory_for_segments = total_memory - heap - direct_memory - jvm_overhead (~1G).

So I would have = 64 - 4.2 - 1.75 - 1 JVM - 1 (OS) = 56 G for segment mapping

So if my coordinator shows 1 TB of data ( with replication of 2) I would need 1024/55 ~= 18 machines to keep everything in RAM ? ( and keep server.maxSize = 56 GB)

I was wondering if mapping took more ram for the same segment than how much the coordinator would show on UI. Is there a way to get to the number ?

Also 1 TB is with replication factor of 2, if I want to keep only 9 machines would historical be smart to have 1 copy on ram and 1 copy on disk ? and broker would be smart to pick the right node ?

Thank you for taking out time to help us.

ping

Sorry for the delay, and thanks for your patience.

So if my coordinator shows 1 TB of data ( with replication of 2) I would need 1024/55 ~= 18 machines to keep everything in RAM ? ( and keep server.maxSize = 56 GB)

I think the coordinator console shows the replicated size on the main ‘cluster’ view, so I think this should be correct.

I was wondering if mapping took more ram for the same segment than how much the coordinator would show on UI. Is there a way to get to the number ?

The size displayed for each segment in the datasources view in the coordinator console is the size of the segment that will be mapped on historical nodes. The ‘smoosh’ file contained in the zipped segment package from deep storage is the file that is mapped (and also the size of the unzipped contents is the size displayed in the coordinator).

Also 1 TB is with replication factor of 2, if I want to keep only 9 machines would historical be smart to have 1 copy on ram and 1 copy on disk ? and broker would be smart to pick the right node ?

Which “pages” of the mapped file are cached in memory and which are not is controlled by the operating system, by a number of factors such as how recently they have been accessed, replication just controls how many copies of each segment are loaded across historicals in the cluster. In other words, with replication of 2 assuming there is not enough space to fit all segments in the page cache, any given segment may be ‘in memory’ in one, both, or none of the historicals, and the broker doesn’t really have anyway to know this currently.

Cheers,

Clint

Thank you for taking out time Clint Wylie. This is very helpful.