New Kafka ingestion worker (supervisor) configured to read from the beginning

What happens if you’ve configured a new supervisor to read from the beginning?

  • There shouldn’t be any duplicate rows

  • The offsets will be stored in metadata storage

  • The offsets can become stale if the data is purged in the Kafka topic, but, even though you can lose data, you won’t process the same message twice

Other things to note:

  • If you’ve configured a new supervisor to read from the beginning, the new supervisor won’t read from the beginning after every restart

  • If you’re manually resetting running supervisors, there is a possibility for duplication

  • Resetting the supervisor will clear the offsets from metadata storage, so some of the messages will be reread

  • If you update your supervisor spec and redeploy it, the updated supervisor should resume from offsets stored in metadata storage

Here’s more context.