Hi Diego,
The historical heap is used to store the following:
-Lookups
-Unmerged query results
-Per-segment and per-column information
Very large lookups can use a lot of heap. Complex queries covering a large interval can use require a lot of heap.
Typically, the last of these is not a big factor, as druid stores maybe a few KB per segment and a few hundred bytes per segment-column in memory. However, if you have a lot of segments or very many columns per segment, this can add up. For example, with 100k segments and 1,000 columns per segment, you’d need about ~10GB of heap for this alone.
If you are encountering OOMs, you will need to figure out where they are happening; there should be clues in the stack trace. But, most likely, you will need to increase the heap available to the historical process.
Thanks,
Max