When having a lot of MiddleManagers (300-600, each with 2 workers), it looks like they are DoS-ing the Coordinator with tons of LDAP user cache lookup. See:
at org.apache.druid.query.lookup.LookupReferencesManager.fetchLookupsForTier(LookupReferencesManager.java:576) ~[druid-server-0.22.0.jar:0.22.0]
at org.apache.druid.query.lookup.LookupReferencesManager.tryGetLookupListFromCoordinator(LookupReferencesManager.java:429) ~[druid-server-0.22.0.jar:0.22.0]
at org.apache.druid.query.lookup.LookupReferencesManager.lambda$getLookupListFromCoordinator$4(LookupReferencesManager.java:407) ~[druid-server-0.22.0.jar:0.22.0]
org.apache.druid.query.lookup.LookupReferencesManager.getLookupListFromCoordinator(LookupReferencesManager.java:397) ~[druid-server-0.22.0.jar:0.22.0]
at org.apache.druid.query.lookup.LookupReferencesManager.getLookupsList(LookupReferencesManager.java:374) ~[druid-server-0.22.0.jar:0.22.0]
at org.apache.druid.query.lookup.LookupReferencesManager.loadAllLookupsAndInitStateRef(LookupReferencesManager.java:357) ~[druid-server-0.22.0.jar:0.22.0]
at org.apache.druid.query.lookup.LookupReferencesManager.start(LookupReferencesManager.java:157) ~[druid-server-0.22.0.jar:0.22.0]
We are running 5 Coordinators, each have 50GB of RAM with 7 CPU and configured with 500 threads. The coordinator log looks very clean as well.
I tried changing the poolingInterval to 10 minutes, but looks like the problem is still there.
How to resolve this issue?