Re: [druid-user] Real time ingestion tasks started hitting direct memory limit

The formula is a good starting point but based on the characteristics of data/metrics/etc, you may need more direct memory buffers during the merging phase of indexing. If it’s a real-time task then it requires further DM for query needs as well.
It is unfortunately not very straightforward to approximate how much direct memory is needed for indexing since it is very dependent on the data being ingested and many other factors.

This configuration is a maximum limit, so it’s not really that harmful to increase it, it’s better to have it reasonable based on running few tests for given ingestion and dataset.

XX:MaxDirectMemorySize places an upper bound on the amount of direct memory that can be allocated before the JVM throws an OOM, it doesn’t increase the amount of direct memory used. So if the task needs 5 Gb DirectBuffer it will just use 5 Gb of available memory though XX:MaxDirectMemorySize=8g

If the indexing task tries to allocate more direct memory and it hits themaxDirectMemory limit, it will throw an OOM and exit gracefully, meaning that it’ll trigger things like -XX:+HeapDumpOnOutOfMemoryError if set to give us more information for diagnostics, and the bound will also prevent the process from consuming all the memory on the machine which could adversely affect other processes.

If the maxDirectMemory is too high, when the indexing task tries to allocate more direct memory, it will not be limited by the JVM, but if the machine is out of memory, it will probably trigger the OS OOM killer which will hard kill the process which will not trigger a heap dump or other end of lifecycle hooks, so that’s why it makes sense to set it reasonably, but it’s not overly harmful to increase, especially if necessary.

Thanks and Regards,
Vaibhav