We’ve been having some trouble getting our coordinators to come up. They keep getting stuck somewhere in the startup sequence, right after connecting to the metadata table. Guice is unable to start and keeps failing health checks until our k8s probes bounce the pod at 10 min.
We’re wondering if it might have to do with the number or size of our segments. We’re also looking into whether we might be missing some configuration in coordinator. Some details about our cluster:
- We have 800k segments spread across 12 historical nodes.
- avg size 3.3 mb
- avg num rows 12.6k
- Our coordinators have 5 cores, with balancerComputeThreads set to 4
- We have auto-compaction enabled, with target set to 5 million rows
We’ve been noticing this instability during deployments, so maybe the historical nodes also coming up around the same time may have an effect, but that’s unclear as well. As of now, we don’t have any limits on the number of segments being considered for movement, or maxSegmentsInNodeLoadingQueue.
If anyone has any insight into what’s going on in the startup process that might be causing this delay, or any parameters that we should try tweaking in the coordinator or elsewhere, that would be really appreciated.
I wonder whether zookeeper is getting overloaded from all the segment-info requests for the 800k segments. Your segments sound very small, so compaction might definitely help. Is the auto-compaction doing anything? Also, in the meantime, this is a stab in the dark, I wonder whether using http-based segment discovery instead of zk might help (unless you already are). This was introduced here, but parameters have changed a bit, see here, eg, druid.serverview.type.
just a sanity check, the http based segment discovery is safe, right? It’s been 4 years since it was introduced but I do see that note saying it’s for testing only and shouldn’t be used for production. We’re currently on 0.18.
Hi Chris - That’s a good question… I’ve seen people use it, and the main issue or question that I remember was mainly performance of different components (nothing like data problems). But, that was more recently, and 0.18 is a couple of years old, so I’m not sure. I guess I might hold off on that (unless you can test thoroughly) and try to get segments compacted first.
We also deploy Druid on Kubernetes, we are on our custom fork of 0.21.1 though…
On our largest cluster, we have > 3M segment files, using curator. On another big cluster we use the “http” for loading > 1M segment files.
Coordinator indeed took a while to boot up, maybe 10 minutes.
So what we did when we changed configuration is to delete the follower pods first, wait 10 minutes until they are all up, then we delete the old leader pod.
If your SSD can handle it, I would jack up the number of segment loading threads. We have set it to 100 in all of our clusters.