I have a general question about GC selection and configuration for production Druid deployments.
I’ll start of with linking this production configuration example from druid.io
Here, the Historical and Broker are using CMS. Could a case be made for flipping to G1GC for Broker and Historical? In our experience, we have seen some long GC pauses, especially on the Historical node, when using CMS. However, when we switched to G1GC in our staging environment, the Historical node stopped experiencing the extended pauses that were previously occurring.
Essentially, I am looking for an explanation/rationalization of the production cluster example linked above. Is there something that I am not considering when wanting to migrate to G1GC instead of CMS? This switch (to G1GC) has often been our approach for other large heap java processes at my org.
I would love to hear what other people have experienced/attempted in terms of GC selection and tuning in production.