My team has a druid cluster in production, and we’ve had several Overlord outages in the past 2 months. The Overlord remains in memory, but is non-responsive. We have to kill -9 it.
Looking back at our logs, I see a pattern for the outages. All of them include these two lines:
2016-09-09T22:39:48,546 ERROR [Curator-LeaderSelector-0] org.apache.curator.framework.recipes.leader.LeaderSelector - The leader threw an exception
java.lang.IllegalMonitorStateException: You do not own the lock: /druid/indexer/leaderLatchPath
Digging a bit deeper on the Curator’s LeaderSelector class, I found a defect on Apache’s issue site that has been resolved recently for the Curator.
The fix was appled on July 28, 2016, and is included in Curator versions 3.2.1 and 2.11.1.
I checked our Druid (0.9.1.1) and it uses Curator version 2.10.0
Has anyone else experienced this error?
Any advice on upgrading Curator?