Deleting Data Source - Druid

I am trying to delete a data source in Druid 0.12.1. Conceptually I understand I need to mark the segments as unused before I can delete them. Reading through this forum and other web sites, I tried:

Mark Unused

curl -X ‘POST’ -H ‘Content-Type:application/json’ -d ‘{ “interval” : “2019-07-11T14:15:00.000Z/2019-07-11T22:30:00.000Z” }’ http://localhost:8081/druid/coordinator/v1/datasources/sensor-db-chris/markUnused

Delete

curl -X DELETE "http://localhost:8081/druid/coordinator/v1/datasources/sensor-db-chris/intervals/2019-07-11T14:15:00.000Z/2019-07-11T22:30:00.000Z

These commands do not throw any errors, but files persist in the segments and segments-cache directories. Is this an issue in 0.12.1 or are my commands incorrect?

If the former, how can I delete the data source? Can I shut down the server and just delete the files?

Thanks,

Chris

Hi Chris,

curl DELETE will only delete the segments from console, if you want to delete from deep storage you need to submit kill task.

post the json in the below format to delete from deep storage

{
“type”: “kill”,
“dataSource”: “”,
“interval” : “”
}

curl -X ‘POST’ -H ‘Content-Type:application/json’ -d@ /druid/indexer/v1/task

Hi Naresh,

Thanks for your message. This didn’t seem to work for me. I created this file:

{

“type”: “kill”,

“dataSource”: “sensor-db-chris”,

“interval” : “2019-07-11T14:15:00.000Z/2019-07-11T22:30:00.000Z”

}

in /opt/druid-0.12.1/sensor-db-chris-kill.json

Then I ran this command:

curl -X DELETE "http://localhost:8081/druid/coordinator/v1/datasources/sensor-db-chris/intervals/2019-07-11T14:15:00.000Z/2019-07-11T22:30:00.000Z

Response was: {“task”:“kill_sensor-db-chris_2019-07-11T14:15:00.000Z_2019-07-11T22:30:00.000Z_2019-09-30T16:03:28.840Z”}

However, data files still persist:

2019-07-11T14:15:00.000Z_2019-07-11T14:30:00.000Z 2019-07-11T18:30:00.000Z_2019-07-11T18:45:00.000Z

2019-07-11T14:45:00.000Z_2019-07-11T15:00:00.000Z 2019-07-11T18:45:00.000Z_2019-07-11T19:00:00.000Z

2019-07-11T15:00:00.000Z_2019-07-11T15:15:00.000Z 2019-07-11T19:00:00.000Z_2019-07-11T19:15:00.000Z

2019-07-11T15:15:00.000Z_2019-07-11T15:30:00.000Z 2019-07-11T19:15:00.000Z_2019-07-11T19:30:00.000Z

2019-07-11T15:30:00.000Z_2019-07-11T15:45:00.000Z 2019-07-11T19:30:00.000Z_2019-07-11T19:45:00.000Z

2019-07-11T15:45:00.000Z_2019-07-11T16:00:00.000Z 2019-07-11T19:45:00.000Z_2019-07-11T20:00:00.000Z

2019-07-11T16:00:00.000Z_2019-07-11T16:15:00.000Z 2019-07-11T20:00:00.000Z_2019-07-11T20:15:00.000Z

2019-07-11T16:15:00.000Z_2019-07-11T16:30:00.000Z 2019-07-11T20:15:00.000Z_2019-07-11T20:30:00.000Z

2019-07-11T16:30:00.000Z_2019-07-11T16:45:00.000Z 2019-07-11T20:30:00.000Z_2019-07-11T20:45:00.000Z

2019-07-11T16:45:00.000Z_2019-07-11T17:00:00.000Z 2019-07-11T20:45:00.000Z_2019-07-11T21:00:00.000Z

2019-07-11T17:00:00.000Z_2019-07-11T17:15:00.000Z 2019-07-11T21:00:00.000Z_2019-07-11T21:15:00.000Z

2019-07-11T17:15:00.000Z_2019-07-11T17:30:00.000Z 2019-07-11T21:15:00.000Z_2019-07-11T21:30:00.000Z

2019-07-11T17:30:00.000Z_2019-07-11T17:45:00.000Z 2019-07-11T21:30:00.000Z_2019-07-11T21:45:00.000Z

2019-07-11T17:45:00.000Z_2019-07-11T18:00:00.000Z 2019-07-11T21:45:00.000Z_2019-07-11T22:00:00.000Z

2019-07-11T18:00:00.000Z_2019-07-11T18:15:00.000Z 2019-07-11T22:00:00.000Z_2019-07-11T22:15:00.000Z

2019-07-11T18:15:00.000Z_2019-07-11T18:30:00.000Z 2019-07-11T22:15:00.000Z_2019-07-11T22:30:00.000Z

Open Dev tools (on Chrome. Or firebug on Firefox) and observe network traffic when you delete from UI. Same will work. Note that after you delete/disable a datasource, you need to delete it with a kill task too

You need to mention only interval dates in interval part of json not the full segment

Something like below

“2019-07-11/2019-07-11”

/druid/coordinator/v1/datasources/{data_source}/markUnused API seems to be available only from Druid version 0.15.0. Look at the documentation for prior versions. The oldest I could get hold off is

https://druid.apache.org/docs/0.13.0-incubating/tutorials/tutorial-delete-data.html. See if this helps.

Hey Karthik - I have to do this from the command line on the server, I am not accessing via console on a web browser.

Sure, once you identify how the delete requests are formed, you can send them through curl. I just meant that checking it from UI as to what the delete request should be is straight forward, just as an alternative to going through documentation. That’s how I figured it and has been working for me.