Cannot remove a datasource

Hi all.
First of all thank you for giving Druid open source. It’s a great software.
I have set up a cluster with five nodes: each node is dedicated to an unique Druid kind of executor, i.e. : Coordinator, Broker, Historical, Overlord, Realtime.
Using the coordinator Web UI dashboard, in the “Old coordinator console” mode, I firstly disabled a datasource, then I tried to “Permanently Delete Segments” for that datasource.
I get a pop-up with this error:
{“error”:“Exception occurred. Are you sure you have an indexing service?”}
Can you give me some hint?
Thanks,
Marco

We generally recommend not hard deleting data off of deep storage as deep storage is supposed to be used as a permanent backup of data.If you want to hard delete from deep storage, you’ll need to set up a full indexing service.

Thank you Fangjin for your reply and Happy New Year!
The datasources I needed to destroy were created for testing Druid, and I needed to save some space. Surely I will not hard delete “real” data in the future …
I set-up a full indexing service, with a middleManager working on a dedicated node, and all now is working; namely I am able to hard delete segments both from the “Old coordinator console” and by using HTTP POST endpoint of the overlord.
Let me only signal that if you do not insert a time-range in the “Interval” text box of “Permanently Delete Segments” page, you still receive the message “Are you sure you have an indexing service?”, that is a little bit misleading.
Finally I have a question: I have seen that segments related directories/files are not deleted from the deep storage (I am using NFS currently).
Could you confirm this is the usual Druid behavior?
Can I manually delete those directories without corrupting anything?
(I have seen however that segments are deleted as well from mySQL metadata RDBMS).
Thank you,
Marco

Hi Marco, inline.

Thank you Fangjin for your reply and Happy New Year!
The datasources I needed to destroy were created for testing Druid, and I needed to save some space. Surely I will not hard delete “real” data in the future …
I set-up a full indexing service, with a middleManager working on a dedicated node, and all now is working; namely I am able to hard delete segments both from the “Old coordinator console” and by using HTTP POST endpoint of the overlord.
Let me only signal that if you do not insert a time-range in the “Interval” text box of “Permanently Delete Segments” page, you still receive the message “Are you sure you have an indexing service?”, that is a little bit misleading.

That’s a good point. Perhaps a better message is needed here. This is a great opportunity to contribute to the project BTW :slight_smile:

Finally I have a question: I have seen that segments related directories/files are not deleted from the deep storage (I am using NFS currently).
Could you confirm this is the usual Druid behavior?
Can I manually delete those directories without corrupting anything?

Druid will not hard delete any segments unless they are marked as unused. Double check that these segments aren’t loaded or being used anywhere. If the mysql entry is deleted though, the files should be deleted from NFS. Were there any weird exception messages in the kill task?

Hi Fangjing,
I agree with you, it would be the right thing to contribute to the project… I hope to be able to soon.

I have seen that segments are effectively deleted from NFS, even if the parent directories related to the dataSource and to the segments interval are left on the NFS itself.
I was misled by the presence of the parent dir with the dataSource name and contained intervals :slight_smile:
Is this what you expect to happen?

I have also a question related to the Index Coordinator console (remind that in my configuration the Overlord is executed on a node different from the middleManager one):
until the task (index or kill) is in the “Running tasks” list, I am able to click on the task “log (all)” hyperlink and see the log contents. As soon it is ended and passed to
the “Complete Tasks” list, clicking on the same hyperlink returns:

No log was found for this task. The task may not exist, or it may not have begun running yet.
Is this related to the fact that the peon task executed on another node?

If useful, I attach you a tgz containing the log of indexing and kill task, and the list of content of the dataSource File System before and after the kill task was executed.
Best regards,
Marco

logs.tgz (77.5 KB)

Inline.

Hi Fangjing,
I agree with you, it would be the right thing to contribute to the project… I hope to be able to soon.

I have seen that segments are effectively deleted from NFS, even if the parent directories related to the dataSource and to the segments interval are left on the NFS itself.
I was misled by the presence of the parent dir with the dataSource name and contained intervals :slight_smile:
Is this what you expect to happen?

It is possible that for NFS, the kill task isn’t cleaning up parent directories. The task was originally created for S3, which doesn’t really have a concept of directories. A unit test to verify this behavior may help us debug things.

I have also a question related to the Index Coordinator console (remind that in my configuration the Overlord is executed on a node different from the middleManager one):
until the task (index or kill) is in the “Running tasks” list, I am able to click on the task “log (all)” hyperlink and see the log contents. As soon it is ended and passed to
the “Complete Tasks” list, clicking on the same hyperlink returns:

No log was found for this task. The task may not exist, or it may not have begun running yet.
Is this related to the fact that the peon task executed on another node?

This is a common error message if you have incorrectly set up how to back up task logs.

You are right. I did not configure task logs directory on deep storage.
Now all works in Overlord Web GUI.
Thanks and apologize,
Marco

@Marco

Could you please file a github issue about not cleaning the parent directories when using NFS as deep storage? I’m curious if the same thing happens on the HDFS deep store.

Hi Charles,
I am not confident in filing an issue in GitHub.
If for you it is not a problem, could you give me some hints or some pointer to a doc page?
I presume I should register somewhere: is this on Github for Druid or Github in general?
Thanks for your help,
Marco

Sure! Once you have a github account you can file an issue with the big, green [New Issue] button at https://github.com/druid-io/druid/issues

If you click on the New Issue button without an account you are given the option to either login or make a new account.

If it turns out to be too much effort let me know and I can file the issue.

Hi Charles,
thanks for your help. Filed at:


Best,
Marco