Hi! I’m not sure I understand the issue … but maybe it will help to know that Druid is just a sink - it is a normal consumer, subscribing to the stream. It records where it is in the stream in its own data - this offset is updated safely to guarantee exactly once ingestion. You don’t have to do anything in particular…
Hey Abraham - sorry for the late reply.
I’ve been reliably informed that Druid stores the committed offsets (read and published segments) in the Metadata Database (druid_dataSources?).
On each iteration of the supervisor’s run loop, the supervisor fetches the list of partitions from Kafka and determines the starting offset for each partition (either based on the last processed offset if continuing, or starting from the beginning or ending of the stream if this is a new topic).