Hi, we are running very large heaps in druid 0.10.0 for Druid historical node (70G). Yet we are still seeing the heap become full and eventual full GC runs which cause a multitude of problems, including failed queries and missing zk heartbeats leading to temporarily dropping out of cluster. We were using CMS for GC but switched to G1GC to see if that could help stem the issues. It did not.
What can we do to help bring our Historical heap under control? Right now we have a 2G cache that allowed caching of groupBy queries, we can stop that caching, but if it is only 2G I’m not sure we will get much help there.
Docs indicate that groupBy’s use on heap memory. Is that the case for v2 as well? What I have read on v2 is that it moved a lot of the memory off heap.
Any advice about reigning in Historical heap and GC would be greatly appreciated.