Facing issue in druid router - org.apache.druid.java.util.common.ISE: No default server found

Hi,

Our druid instances (which are deployed as pods - broker/coordinator/router/historicals) is constantly reporting with the below error message from the ‘Router’ logs:

2022-07-28T09:43:52,181 WARN [qtp2036295297-121] org.eclipse.jetty.server.HttpChannel - /druid/v2/sql
org.apache.druid.java.util.common.ISE: No default server found!

Not sure what is going on wrong here. Most of the blogs are routed to the ZooKeeper as the culprit. The same instance was running perfectly fine 2-3 days before, but all of a sudden it started to fail. Request you to kindly help me on the same.

Note: Tried restarting ZK node along with the druid services… but the same error is reported every time.

Thanks,
Keerthi Kumar N

Hey @keerthikumar welcome!!

That would sound right to me, too – the router isn’t able to find a default server.

Suggest that you check your running server properties on each server using this API:

You can see the Zookeeper config that each process is running using that – just as a check :slight_smile:

If it’s all ok, then I’d go on to check that each process is able to actually let Zookeeper know that it’s alive. I believe you’ll see those messages in each of the logs for each process – look for “curator” which is the java component that does the Zookeeper stuff.

Hello @petermarshallio . Thanks a lot for your response. Could you please elaborate a bit more on this as to what exact runtime properties to be checked ?? Im a newbie to druid…

No worries!!!

The line you’re looking for is druid.zk.service.host – it’s what tells each process where Zookeeper is.

There’s a bit more about what Zookeeper’s doing here:

And right in deep, here’s ALL of the Zookeeper config options:

Awesome @petermarshallio … Below are the details:

druid.zk.service.host=zookeeper-service:2181
druid.zk.paths.base=/druid
druid.zk.service.compress=false

Also, when I tried to hit the apis - /druid/coordinator/v1/leader and /druid/coordinator/v1/isLeader below is the response being shown

{“error”:“Unable to determine destination for [/druid/coordinator/v1/isLeader]; is your coordinator/overlord running?”}

What could be the reason for this sir?

Thanks,
Keerthi Kumar N

Surprisingly the error {“error”:“Unable to determine destination for [/druid/coordinator/v1/isLeader]; is your coordinator/overlord running?”} is shown when I executed some of the APIs and I believe there lies the root cause. Kindly help me with the same.

Yep that definitely sounds like you’re getting to the root of it.

I wonder if your overlord-coordinator process is unable to advertise to Zookeeper – or that the router process likewise is unable to get to it to see what’s being advertised.

I seem to remember that you can ls inside Zookeeper itself using the command line tools, and actually see what is being advertised… but it has been a while since I actually did that hahah!! I think better may be to look at the process logs and see what is in there about whether your processes are reaching ZK or not.

Awesome… thanks a lot…let me do the above mentioned checks said by you. Hope i would b able to resolve the issue ASAP… Kindly do not mind if reach u back again for some help :slight_smile:

1 Like

Also could u pleqse let me know if there is a way to completely eliminate zookeeper with druid and use kubernetes instead???

Hi @keerthikumar, there is an experimental extension that does exactly that. You can read about it here and if you test it, we’d all love to hear about your results, so please share.

Thanks a lot @Sergio_Ferragut :slight_smile:

Hello @Sergio_Ferragut … thanks for your response below. However on following the steps mentioned in the below link, I am now getting the below exception on trying to restart the COORDINATOR pod of druid.

Below are the changes I have included in all the druid pods - broker/historicals/coordinator/router

druid.zk.service.enabled=false
druid.serverview.type=http
druid.coordinator.loadqueuepeon.type=http
druid.indexer.runner.type=httpRemote
druid.discovery.type=k8s

druid.discovery.k8s.clusterIdentifier=druid-staging
druid.discovery.k8s.podNameEnvKey=POD_NAME
druid.discovery.k8s.podNamespaceEnvKey=POD_NAMESPACE

Can anyone kindly help me asap on this issue as it is blocking a critical work

2022-08-01T08:30:07,536 ERROR [org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcherbroker] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher - Expection while watching for NodeRole [BROKER].
org.apache.druid.java.util.common.RE: Expection in listing pods, code[403] and error[{“kind”:“Status”,“apiVersion”:“v1”,“metadata”:{},“status”:“Failure”,“message”:“pods is forbidden: User "system:serviceaccount:staging:default" cannot list resource "pods" in API group "" in the namespace "staging"”,“reason”:“Forbidden”,“details”:{“kind”:“pods”},“code”:403}
].
at org.apache.druid.k8s.discovery.DefaultK8sApiClient.listPods(DefaultK8sApiClient.java:94) ~[?:?]
at org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher.watch(K8sDruidNodeDiscoveryProvider.java:229) ~[?:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_275]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_275]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_275]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_275]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275]
Caused by: io.kubernetes.client.openapi.ApiException: Forbidden
at io.kubernetes.client.openapi.ApiClient.handleResponse(ApiClient.java:971) ~[?:?]
at io.kubernetes.client.openapi.ApiClient.execute(ApiClient.java:883) ~[?:?]
at io.kubernetes.client.openapi.apis.CoreV1Api.listNamespacedPodWithHttpInfo(CoreV1Api.java:30285) ~[?:?]
at io.kubernetes.client.openapi.apis.CoreV1Api.listNamespacedPod(CoreV1Api.java:30179) ~[?:?]
at org.apache.druid.k8s.discovery.DefaultK8sApiClient.listPods(DefaultK8sApiClient.java:83) ~[?:?]

The above exception is being thrown for BROKER and HISTORICALS as well

Thanks,
Keerthi Kumar N

Hi @keerthikumar, we might want to start a separate question to discuss this further so it does not get buried in a different subject.

In the meantime, the problem seems to be authentication or authorization related. I don’t know the answer, but I wonder whether it has anything to do with the “Gotchas” section of the extension’s description where they mention Kubernetes Role and Role Binding which seems relevant.

Let us know if this helps.

Sergio

Thanks @Sergio_Ferragut . Let me open a new request for this issue.