Avoiding OutOfMemoryErrors (heap space)

Hi, we occasionally get OutOfMemoryErrors (for heap space) in our historical and broker nodes. Is this possibly due to misconfiguration, or is it expected to encounter them if you are getting load beyond the capacity that the cluster to handle?

If they are expected, are they supposed to be recoverable? It seems to us like we see problems occurring around the times we are seeing the OOM errors and often respond by restarting the cluster. (I don’t really have a solid set of symptoms to list here - it’s more anecdotal and it could just be things that are downstream of the OOM errors themselves that might clear up if given time.)

The most recent example occurred after we updated our QTL file and were too impatient to wait for the polling to pick up the changes, so we restarted the historicals. One of the two nodes suffered multiple OOM errors over the next hour and a half (from failing to load segments it looks like mostly, but hard to tell exactly from the logs). Then we rolled back the QTL file and restarted, though I’m not convinced the new QTL was the problem, as it seemed to load up just fine.

Are there particular things to look for in the Druid metrics to let us know that we are in danger of running out of memory, and/or clues to help figure out how much heap memory we ought to have? I mean, it may well just be that our cluster is underpowered, but I’d love to have some insight into how much we need to bump it up rather than relying on trial and error.

For general context - running 9.1.1, two historicals on m4.xlarges (2G heap though I’m going to bump it to 3G) and two brokers on m4.larges (2G heap), we do a lot of groupBy queries and are using a fairly large static QTL file (~174K rows in that file, 13 lookup columns, some of which are fairly high cardinality). ~90G total in storage so far, and our segment sizes are fairly low (100M-200M, I don’t think any are greater than 300M).

Hey Ryan,

OOMEs are not expected in a well configured cluster. There are protection mechanisms in place that are intended to cause individual queries to fail when resources are low, rather than bringing down the whole server. But defaults sometimes need to be tuned for these mechanisms to work properly.

The main things historicals and brokers use the heap for are result merging and (if you have them and they are on-heap) QTLs. Odds are one of those things is what is causing you to run out of heap. Some things you could look at:

  1. For result merging, with the current groupBy engine, the main thing protecting your heap is the maxResults parameter (default 500k). You need to make sure that the number of results possibly in-memory for whatever number of concurrent queries you have will not exceed the heap size. This can be really challenging, but the good news is that the new v2 groupBy engine (in master) does result merging off heap with strict controls on memory usage. So probably your best bets here are to be careful about maxResults or to try out the new groupBy engine.

  2. For QTLs, you could consider moving your QTLs offheap by setting the offHeap parameter, so they aren’t contending for heap space. Or just be careful about how large they are when you load them and make sure they fit in heap. A very rough estimate is that you can expect heap usage to be 2–4x the size of the underlying lookup file. Part of this is overhead of the java map data structures and part of this is the fact that java Strings are stored in UTF-16 rather than UTF-8. Or, you can figure out exactly how much heap they use by taking a heap dump and analyzing it with something like yourkit, and seeing how much space is taken up by Lookup related objects and anything retained by them.

Great, thanks, some things for me to chew on.

Our QTL could definitely be a factor here. Does the loaded QTL use that amount of heap memory persistently, or is that only during load time? If we move it off heap do we have to worry about increasing direct memory?

Also, trying out the new groupBy engine was already on my list. It just got bumped up a couple of notches. :slight_smile:

Hey Ryan,

On-heap QTLs use that amount of heap persistently, whether you are actively querying them or not. I believe the off-heap implementation uses memory mapped files rather than direct memory, so you don’t need to increase direct memory. But you do need to be careful about the potential for swapping, since because they are file-backed, the OS may page them out just like it can page out data segments if there is memory pressure.

off-heap uses MapDB’s memory mapped file implementation with a small-ish on-heap buffer.

Gian, one quick followup question. You wrote, “A very rough estimate is that you can expect heap usage to be 2–4x the size of the underlying lookup file.” We use a single lookup file that has 13 or 14 lookup columns. Would druid be holding onto 13 or 14 copies of the whole file, or would it just strip down to key/value pairs for each of the lookups?

Each “copy” strips out the unused data. So the keys will be copied 13 or 14 times, but the values will be one copy in each lookup.

Hey Ryan,

A “result” in this context is a grouping bucket. There’s no limit on the number of raw rows, since memory usage scales with the number of group buckets, not the number of inputs. With the v1 groupBy engine, if the number of group buckets is greater than maxResults, the query will fail.