restoreTasksOnRestart behavior during rolling restarts

Hi, we’d like to use Druid 0.9.0’s realtime task restoration feature to make our rolling Druid upgrades/restarts easier. Currently, all of our realtime ingestions tasks are double replicated (2 tasks total). If I restart one middle manager, will Tranquility buffer events for that middle manager’s tasks until the tasks come back up while at the same time sending events to the tasks that remain running? I just want to confirm that at the end of a full rolling restart, all replicated tasks should have ingested exactly the same events. Thanks!

Hey TJ,

Each Tranquility batch is retried until it’s sent to all non-failed replicas, or until the send timeout is exhausted. Tasks that are restorable do count as non-failed. So assuming the task doesn’t take too long to start back up, the batch will make it through when the task comes back up. You can also extend the timeout if you need to by adjusting druidBeam.firehoseRetryPeriod.

Although, even though this case should be handled well, it’s optimistic to hope for “exactly the same events” given the design of Tranquility in general. It tries pretty hard to keep tasks in sync, but at the end of the day it is really best effort and there are no strict guarantees. Part of the motivation behind the new Kafka based ingestion stuff is to offer those guarantees.

Ok, cool, that makes sense. Yeah, we’re pretty excited about the new Kafka ingestion task, till then we do periodic batch reingestions, so this will definitely work for us. Thanks!