We use Druid to ingest Kafka data.
I have a few basic questions:
I found a druid_datasource metadata table (hosted with postgres in our case). I see a bunch of Kafka offsets stored in there. My question is what does Druid use this table and the offsets stored inside it for?
Where does Druid get its Kafka offsets from? Does it get them from Kafka? Or does it maintain its own offsets?
What happens if I truncate the druid_datasource table?
When I reset the supervisor (useEarliestOffset=false), how does Druid know where to resume reading Kafka from? (kind of the same question as #2 I guess…)
Problem we’re trying to solve
We had our Kafka+Druid hosted in AWS. We’re migrating it to Google. Problem is, there is no way for us to reliably migrate the Kafka offsets in AWS as is to Google. We have to do some translation magic. But because of that, Druid is breaking because it’s looking for old offsets. We’re trying to figure out how to get Druid to pickup the translated offsets but we can’t tell whether Druid gets offsets from Kafka (consumer groups) or uses the ones it stores in the druid_datasource metadata table.
Hey Prayas: Druid stores offsets following the usual consumer pattern, updating them when it knows that it has safely ingested and started advertising the data from Kafka. As regards resets, there are a number of methods and you can see them documented here – things like resetOffsetAutomatically for example.
Okay, I just checked my druid_datasource table and it’s empty. My druid is ingesting from kafka. Where is it getting offset info from?
For example, I tried the following steps and druid is able to remember the latest offset somehow:
My understanding (could be wrong) is that useEarliestOffset means use the earliest record found on the kafka side. Setting it false means to use the most recent record it can find. Ie, start from the beginning (or end) of what you can find in kafka now. If you mean where exactly it gets that from kafka (records or some kind of kafka metadata store), I’m not sure. Maybe that’s what you’re asking though…