[druid-user] Druid broker failed to start after upgrade to 24.0

Hello,

Relatively new to druid, we recently upgraded a druid cluster from 0.12 to 24, after upgrade, the druid console failed to load properly.

I checked the logs , it seems broker have issues to start up, any suggestions what I need look into next.

Looks like it is failing at this line 221 at DruidLeaderClient.

06:15:11.948 [main] ERROR org.apache.druid.query.lookup.LookupReferencesManager - Error while trying to get lookup list from coordinator for tier[__default]
org.apache.druid.java.util.common.IOE: Retries exhausted, couldn’t fulfill request to [https://tap-druid-coordinator-alpha-1e5-ad7799a3.us-east-1.amazon.com:8281/druid/coordinator/v1/lookups/config/__default?detailed=true].
at org.apache.druid.discovery.DruidLeaderClient.go(DruidLeaderClient.java:221) ~[druid-server-24.0.0.jar:24.0.0]
at org.apache.druid.discovery.DruidLeaderClient.go(DruidLeaderClient.java:127) ~[druid-server-24.0.0.jar:24.0.0]
at org.apache.druid.query.lookup.LookupReferencesManager.fetchLookupsForTier(LookupReferencesManager.java:587) ~[druid-server-24.0.0.jar:24.0.0]
at org.apache.druid.query.lookup.LookupReferencesManager.tryGetLookupListFromCoordinator(LookupReferencesManager.java:435) ~[druid-server-24.0.0.jar:24.0.0]
at org.apache.druid.query.lookup.LookupReferencesManager.lambda$getLookupListFromCoordinator$4(LookupReferencesManager.java:412) ~[druid-server-24.0.0.jar:24.0.0]
at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:129) ~[druid-core-24.0.0.jar:24.0.0]
at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:81) ~[druid-core-24.0.0.jar:24.0.0]
at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:163) ~[druid-core-24.0.0.jar:24.0.0]
at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:153) ~[druid-core-24.0.0.jar:24.0.0]
at org.apache.druid.query.lookup.LookupReferencesManager.getLookupListFromCoordinator(LookupReferencesManager.java:402) ~[druid-server-24.0.0.jar:24.0.0]
at org.apache.druid.query.lookup.LookupReferencesManager.getLookupsList(LookupReferencesManager.java:379) ~[druid-server-24.0.0.jar:24.0.0]
at org.apache.druid.query.lookup.LookupReferencesManager.loadAllLookupsAndInitStateRef(LookupReferencesManager.java:362) ~[druid-server-24.0.0.jar:24.0.0]
at org.apache.druid.query.lookup.LookupReferencesManager.start(LookupReferencesManager.java:162) ~[druid-server-24.0.0.jar:24.0.0]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_342]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_342]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_342]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_342]
at org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler.start(Lifecycle.java:446) ~[druid-core-24.0.0.jar:24.0.0]
at org.apache.druid.java.util.common.lifecycle.Lifecycle.start(Lifecycle.java:341) ~[druid-core-24.0.0.jar:24.0.0]
at org.apache.druid.guice.LifecycleModule$2.start(LifecycleModule.java:152) ~[druid-core-24.0.0.jar:24.0.0]
at org.apache.druid.cli.GuiceRunnable.initLifecycle(GuiceRunnable.java:136) ~[druid-services-24.0.0.jar:24.0.0]
at org.apache.druid.cli.GuiceRunnable.initLifecycle(GuiceRunnable.java:94) ~[druid-services-24.0.0.jar:24.0.0]
at org.apache.druid.cli.ServerRunnable.run(ServerRunnable.java:63) ~[druid-services-24.0.0.jar:24.0.0]
at org.apache.druid.cli.Main.main(Main.java:112) ~[druid-services-24.0.0.jar:24.0.0]

Hello,

we recently upgraded a druid cluster from 0.12 to 24

Can you please share how you upgraded? Did you follow all of the upgrading notes for each of the releases?

Best,

Mark

1 Like

Thanks, Mark,

Since we have some custom scripts and build processes to make druid work in our environments, I mostly copied the new binaries and updated our extensions to work with new version.

From what I am troubleshooting, it seems other nodes are starting up OK, the broker is the one which cannot open the port (8802) locally.

JW

I think it is more new set up than upgrade.

I am able to get the cluster up after fixing some “java class loading error” after turning on detail logging.

Hi JW,

I’m glad to hear that you were able to get the cluster up, and thank you for sharing the point about detail logging.

If you ever have time and are able to share, would you mind posting a bit about what the detail logging revealed? My team is currently working on some logging content (a free course and at least one upcoming talk), and the community can always benefit from specific examples.

Best,

Mark

Hello, Mark,

We have a custom extension which works internally with our TLS communication among different nodes, and that extension is NOT loading some dependency properly.

That information was not available when I the default log level is INFO, and only appeared after I set the level to DEBUG.

I got the help from a co-worker who used to work at imply as well.

Hope this helps.

Thank you.