We’re using tranquility 0.5.0 / druid 0.7.3.
We recently experienced some issues with our ZK cluster, and had to restart our druid cluster as well.
After bringing everything back up, we noticed something pretty strange. All inflight tasks were killed (as expected), and all middlemanagers reported no active tasks, except for ONE middlemanager. One middlemanager still reported that 17 tasks were still active. The tasks were all from the same timerange (02-19T06:00), around the time we restarted things. I hopped on that middlemanager to verify if the tasks were running, and discovered that the none of the peons associated w/ those tasks were actually running anymore. Both the overlord & middlemanager know about the tasks, and both try to shut them down:
On overlord startup, we see this. It looks like the overlord knows these tasks are old, but it can’t shut them down.