Hi Team, sorry to bother again, but maybe you can give me a hint.
In short, we have 134 datasources (visible in coordinator UI console and reported by coordinator datasources endpoint), but only 131 actually exists in our DB!
Long story, we had some errors on Kafka Indexing Service and I tried to modify the Kafka offset in the DB for the corresponding datasources to solve this (as the reset supervisor action didn’t work). As you guess, there is no data sources for the failing tasks! One more thing, we are running daily reingestion task and we don’t get any error for these 3 datasource.
Can you please give me some paths to investigate? Where can be this stored beside the DB?
Do you have any realtime tasks running for those 3 datasources ?
Fwiw, the datasources list in metadata store only represents those datasources which have at least 1 physical segment that was handed over to the historical node.
the coordinator UI shows datasources based on the segments being announced in Zookeeper, which includes segments announced by realtime nodes also.
These announcements are present in zookeeper by default at druid/segments path. you can check the zookeeper announcements of those segments.
you can also get more info on which nodes are announcing those 3 datasources by sending a HTTP GET request to below endpoint with an umbrella interval
So, the idea was to change the Kafka starting offset in the DB to force tasks to start from last offset
Fwiw, the datasources list in metadata store only represents those datasources which have at least 1 physical segment that was handed over to the historical node.
the coordinator UI shows datasources based on the segments being announced in Zookeeper, which includes segments announced by realtime nodes also.
These announcements are present in zookeeper by default at druid/segments path. you can check the zookeeper announcements of those segments.
you can also get more info on which nodes are announcing those 3 datasources by sending a HTTP GET request to below endpoint with an umbrella interval
I can confirm that we have the S3 segments, entries in Zookeeper, historical nodes confirmation with your link for all 3 datasources. I can even see records in DB for these in other tables (druid_pendingSegments, druid_segments, druid_supervisors) but nothing in “druid_dataSource”, which is very strange .
Funny enough, after creating the record manually in the table “druid_dataSource” (with a convenient Kafka offset) and starting the supervisor doesn’t work: task is getting the old offset, failing and the entry is removed from DB
We have it sorted out by cleaning the Kafka messages for the corresponding topics - after the Kafka indexing tasks ran successfully, I have all the records in datasources