[druid-user] Datasources logs

Hi all.
I have a question.
Where can I find data about created datasources except sys.tasks, I need the extended version, where I can see under which account the datasource was created.

Thank you in advance!


Something like request logging, but to see who created a specific datasource?



Hi, Mark. Tanks fo you attention.
Look, there is an web interface that will upload files to the druid. And we want to have information about who, when, from which ip uploaded the file.
I found some audit info in sys.tasks and logs but it’s not enough.

среда, 24 августа 2022 г. в 00:42:37 UTC+5, mark.h...@imply.io:

Hi amiros,

I’m not sure whether the contents of these tables cover your needs, but I just learned here: [DISCUSS] Dropping the task audit log table from metastore · Issue #5859 · apache/druid · GitHub
that druid_audit, druid_tasklogs tables exist and might be helpful for you.
I don’t think these tables are directly accessible from the Druid UI in “sys”, but rather directly in the Metadata Store database.

Let us know if this helps. If not, let us know anyway, it is interesting to capture missing audit requirements for the Apache Druid project.


Thanks, Sergio!

Adding to what Sergio said, another colleague offered the following:

  • I wonder if they would need to actually look at the web logs for the API
  • Which for that matter makes me wonder if it’s logged in the Overlord log somewhere when someone submits an ingestion task?
    If you have time and are able, please share you results. I reached out to several people about this question, and there’s some interest in documenting this.

Thanks, Sergio and Mark!

We decided to do something different, we do not have time to look into the configuration of the Druid, not being sure that we will find what we want. We’ll create our own audit tables and write the data we need into them during the file upload event.

четверг, 25 августа 2022 г. в 19:19:03 UTC+5, mark.h...@imply.io:

That sounds good, it will be interesting for you to flesh out the requirements and implementation of that and perhaps then look at contributing it back to the Apache Druid project.

One thought is that in the sys.tasks table covers some of your requirements but not all, perhaps extending the schema on that table and adjusting the code the feeds it to add missing columns.
It currently covers the what in datasource, the when in the created_time field.
It seems like you need username and source_ipaddr to complete the picture.

I am no expert in the code, but it does seem like the Overlord is the one logging into the table and should be extendable if the missing info is readily available.