Accessing MaxDirectMemorySize Requirements Before Running

Hello all!

I’m writing some scripts to automatically configure a Druid cluster based on the hardware that each node is being run on, and I’m struggling to create a formula for the purposes of allocating Direct Memory to MiddleManager, Historical and Broker. I’ve seen the formula relating MDMS to sizeBytes, numMergeBuffers and numThreads, but occasionally I seem to get this wrong. Druid is obviously calculating its requirements at some point, so I wondered if there was a way to prod Druid for this requirement before start-up, rather than having to calculate it, render the jvm.config and runtime.properties fields and hope for the best?

Many thanks,

Ryan

Hi Ryan,

Tuning Druid is pretty tricky, and an area that we are actively focusing on making as hands off as possible, but we haven’t reached that point yet. I hope to open a PR with the first improvement to make this a bit easier to manage very soon, which will autosize the processing buffers based on the amount of direct memory that is allocated to the jvm, and we have some ideas for follow-ups to improve things like middle manager task sizing as well. But for now, I’ll try to offer some pointers.

The check that Historicals and Brokers use for ensuring they have enough direct memory is calculated by the formula,

druid.processing.buffer.sizeBytes * (druid.processing.numMergeBuffers + druid.processing.numThreads + 1)

which is mentioned in the performance faq (http://druid.io/docs/latest/operations/performance-faq.html), The MiddleManagers themselves do not directly use direct memory, rather the peon processes which are forked off of it to run indexing tasks are the consumers in this case, so the above formula multiplied by the capacity of the MiddleManager to calculate it’s requirement. You will either want to drive the values of these settings from the amount of direct memory you have available, or set the amount of direct memory for the jvm based on the values you would like for these settings, depending on resource constraints.

Historical nodes also greatly benefit from having additional ‘free’ memory outside of the jvm heap and direct memory allocations, to allow the memory mapped segment files to stay cached and reduce the amount of disk reads, so the more of this you can have the better.

The performance faq has additional details about what Druid is using the memory for, http://druid.io/docs/latest/operations/performance-faq.htm. Best of luck with the script!