[druid-user] The peons are DoS-ing the Coordinator. How to resolve this?

When having a lot of MiddleManagers (300-600, each with 2 workers), it looks like they are DoS-ing the Coordinator with tons of LDAP user cache lookup. See:
at org.apache.druid.query.lookup.LookupReferencesManager.fetchLookupsForTier(LookupReferencesManager.java:576) ~[druid-server-0.22.0.jar:0.22.0]
at org.apache.druid.query.lookup.LookupReferencesManager.tryGetLookupListFromCoordinator(LookupReferencesManager.java:429) ~[druid-server-0.22.0.jar:0.22.0]
at org.apache.druid.query.lookup.LookupReferencesManager.lambda$getLookupListFromCoordinator$4(LookupReferencesManager.java:407) ~[druid-server-0.22.0.jar:0.22.0]
org.apache.druid.query.lookup.LookupReferencesManager.getLookupListFromCoordinator(LookupReferencesManager.java:397) ~[druid-server-0.22.0.jar:0.22.0]
at org.apache.druid.query.lookup.LookupReferencesManager.getLookupsList(LookupReferencesManager.java:374) ~[druid-server-0.22.0.jar:0.22.0]
at org.apache.druid.query.lookup.LookupReferencesManager.loadAllLookupsAndInitStateRef(LookupReferencesManager.java:357) ~[druid-server-0.22.0.jar:0.22.0]
at org.apache.druid.query.lookup.LookupReferencesManager.start(LookupReferencesManager.java:157) ~[druid-server-0.22.0.jar:0.22.0]

We are running 5 Coordinators, each have 50GB of RAM with 7 CPU and configured with 500 threads. The coordinator log looks very clean as well.

I tried changing the poolingInterval to 10 minutes, but looks like the problem is still there.

How to resolve this issue?

I forgot to mention the Druid version: 0.22.0 with JDK 8.

I’m curious (so I can learn something) why you say LDAP? Do you use tiers for the lookups? Every peon on every MM is going to load lookups on startup, that’s a lot of lookups loading. If all the lookups aren’t needed for the peons, tiers might help.

Thankfully I can disable the Lookup sync, so that solved the first problem.

But, it looks like MiddleManager has a hard ceiling on how many in the cluster.
We are struggling to run 400-700 MiddleManagers with 2 Peons each.

This is due to the code here CoordinatorPollingBasicAuthenticatorCacheManager .
Every single daemon type including Peon polls the MiddleManager for BasicAuth user information.
Eventually Coordinator gave up to respond when there are so many Peons coming up and down sending this BasicAuth user info request.I tried so many settings between Coordinator/MM/Peon, but nothing can prevent initUserMaps() from breaking.

Can I skip initUserMaps(); for Peon? I am thinking of adding a condition in.

I put a condition in that basically says:

if service == ‘peon’ {
// skip initUserMaps()

And test ingestion is still able to complete.