Handoff of druid segments for realtime tasks

Are segments created by realtime tasks persisted to deep storage once the realtime task ends (i.e after “shut down time” for the tasks)? Are these segments handed off to historicals which would store data into deep storage?

I have data ingested in druid through tranquility library. Realtime tasks are created when events are sent, and i can query data using the druid query API endpoind (druid/v2/)

However before the realtime task finishes, if the druid node fails then would all this data be lost? How to ensure that data indexed by realtime tasks is not lost between druid node failures?

Thanks,

Prathamesh

before real time task fails, if the druid node fails, the task is considered failed, so no data is written to Druid from this task. So hopefully before the retention period is over, the data can be re-ingested from your Kafka/Tranquility source again, then no data is lost.

hope this helps

Ming

Hi Ming,

I have the same understanding. However i was wondering if it would be possible to achieve high availability for the realtime tasks. Going by some earlier posts and answers by Gian, realtime tasks can me configured for HA. Although the post is quite old so not sure if this still holds true.

Reference: https://groups.google.com/forum/#!topic/druid-user/yfS9hYbA6dI

Basically i am wondering if the realtime tasks that are created can be made to run on more than one node so if a node processing realtime task fails, the other node would still be able to do handoff of segments after shut down time, thus not loosing data. A realtime task can run for long duration depending on windows period and granularity and we wouldn’t want to loose data indexed by them.

Thanks,

Prathamesh

Would druid.indexer.storage.type or druid.indexer.runner.type help avoid data loss for real-time tasks?

Thanks,

Prathamesh

Hi Prathmesh,
Tranquility supports creating replica tasks for HA, In the event of failure of one task, the replica task would be able to serve queries and continue the ingestion without any data loss.

you would need to add replicants in your beam configuration.

See https://github.com/druid-io/tranquility/blob/master/core/src/main/scala/com/metamx/tranquility/beam/ClusteredBeamTuning.scala#L30

Thanks Nishant!