tranquility resilience

From what I have seen, tranquility will make one attempt at starting realtime tasks per hour. If you miss your opportunity, you miss out on realtime data for an hour.

example would be:

you are restarting or briefly misconfigured the middleManager and at the time it was trying to start a new realtime task. you have to wait an hour until the tranquility tries again and the system is fully functional.

how do other’s deal with this or is there something I am missing?

for reference, we are running 0.7.1.1 and are going to upgrade to 0.8.1 soon.

this is where the issue has really been coming up because we have to be careful about upgrading one set of middleManagers and waiting an hour to then upgrade the next set.

Hey Arlo,

There are two ways of dealing with that currently- graceful shutdown and autoscaling. You can read about both here: http://druid.io/docs/latest/operations/rolling-updates.html

In the future we would like to make it possible to restart middleManagers and have the tasks resume from where they left off, but that is not currently possible.