Data source doesn't exists in Aurora DB

Hi Team, sorry to bother again, but maybe you can give me a hint.

In short, we have 134 datasources (visible in coordinator UI console and reported by coordinator datasources endpoint), but only 131 actually exists in our DB!

Long story, we had some errors on Kafka Indexing Service and I tried to modify the Kafka offset in the DB for the corresponding datasources to solve this (as the reset supervisor action didn’t work). As you guess, there is no data sources for the failing tasks! One more thing, we are running daily reingestion task and we don’t get any error for these 3 datasource.

Can you please give me some paths to investigate? Where can be this stored beside the DB?

Thanks,

Dan

Hi,

Do you have any realtime tasks running for those 3 datasources ?

Fwiw, the datasources list in metadata store only represents those datasources which have at least 1 physical segment that was handed over to the historical node.

the coordinator UI shows datasources based on the segments being announced in Zookeeper, which includes segments announced by realtime nodes also.

These announcements are present in zookeeper by default at druid/segments path. you can check the zookeeper announcements of those segments.

you can also get more info on which nodes are announcing those 3 datasources by sending a HTTP GET request to below endpoint with an umbrella interval

http://coordinator-ip:port/druid/coordinator/v1/datasources/{dataSourceName}/intervals/{interval}/serverview

Do you have any realtime tasks running for those 3 datasources ?

Yes, we are using Kafka Indexing Service and we have supervisors for those 3 datasources, but the task are failing because of date interval (see also https://groups.google.com/forum/#!topic/druid-user/DQzJqAF9rKc)

So, the idea was to change the Kafka starting offset in the DB to force tasks to start from last offset

Fwiw, the datasources list in metadata store only represents those datasources which have at least 1 physical segment that was handed over to the historical node.

the coordinator UI shows datasources based on the segments being announced in Zookeeper, which includes segments announced by realtime nodes also.

These announcements are present in zookeeper by default at druid/segments path. you can check the zookeeper announcements of those segments.

you can also get more info on which nodes are announcing those 3 datasources by sending a HTTP GET request to below endpoint with an umbrella interval

http://coordinator-ip:port/druid/coordinator/v1/datasources/{dataSourceName}/intervals/{interval}/serverview

I can confirm that we have the S3 segments, entries in Zookeeper, historical nodes confirmation with your link for all 3 datasources. I can even see records in DB for these in other tables (druid_pendingSegments, druid_segments, druid_supervisors) but nothing in “druid_dataSource”, which is very strange .

Thanks for your time, Nishant.

Funny enough, after creating the record manually in the table “druid_dataSource” (with a convenient Kafka offset) and starting the supervisor doesn’t work: task is getting the old offset, failing and the entry is removed from DB

We have it sorted out by cleaning the Kafka messages for the corresponding topics - after the Kafka indexing tasks ran successfully, I have all the records in datasources

Still weird, but is working now just fine

Thanks,

Dan