Exactly-Once Semantics

I’ve been reading through the Druid docs and some of the discussions here and have a question in regards to exactly-once semantics. In a situation where there is only one real-time node, is it still possible to have points dropped or duplicated? My understanding is that duplication is only possible when there are multiple real-time nodes that don’t use the same sharding index. I don’t understand the situation where points are dropped.

Thanks in advance.

Hey Andre,

Even with a single realtime node, there are a couple issues. Most Firehoses use the Runnable-based Committer, with which it’s possible to commit messages that were not yet persisted, leading to drops. And with both the Runnable and metadata-based Committers, it’s possible for data to get persisted with older commit metadata, leading to duplicates.

The Appenderator interface is meant to solve these problems with the Plumber. It’s used by the kafka indexing service task that will be a new experimental feature in 0.9.1.

Awesome, thanks for such a quick reply! This really helps.