Delete data source

Is there a way to delete data from all places (deep storage, metadata etc.) for a datasource?

Yes, you can take a look at the kill task:http://druid.io/docs/latest/Tasks.html

The following command doesn’t seem to work as I am expecting.

curl -X ‘POST’ -H ‘Content-Type:application/json’ -d “{ “type”:“kill”, “id”:“kill_task-myapp_V1-date --iso-8601=seconds”,“dataSource”:“myapp_V1”, “interval”:“2015-01-01T00:00:00Z/2016-03-11T00:00:00Z” }” localhost:8090/druid/indexer/v1/task

``

If I was looking to remove all data within a certain interval. What would be the steps? Do I first need to run a database update script on the “druid_segments” table (‘UPDATE druid_segments SET used = false;’) before running this kill command?

Kill Task

Kill tasks delete all information about a segment and removes it from deep storage. Killable segments must be disabled (used==0) in the Druid segment table. The available grammar is:

What options do I have if I want to completely delete a datasource?

The datasource needs to be disabled first. You’d do it this way:
curl -X ‘DELETE’ localhost:8082/druid/coordinator/v1/datasources/myapp_V1

This does set used=0 in the druid_segments table, but I don’t think you want to do that by touching the database directly.

(The update command you propose would disable all of the datasources, not just one of them - no distinction, of course, if you only have one.)

I tried your suggestion (Used the default port 8081 for coordinator), though while the “druid_segments” database table is now empty ( Only 1 data source), there seems to segment information remaining (Command from http://druid.io/docs/latest/design/coordinator.html).

My goal here is to just delete segment information for a certain time range, I don’t necessarily want to delete the datasource, though will if there is no other option.

curl -X ‘DELETE’ localhost:8081/druid/coordinator/v1/datasources/myapp_V1

curl -X ‘POST’ -H ‘Content-Type:application/json’ -d “{ “type”:“kill”, “id”:“kill_task-myapp_V1-date --iso-8601=seconds”,“dataSource”:“myapp_V1”, “interval”:“2015-01-01T00:00:00Z/2016-03-11T00:00:00Z” }” localhost:8090/druid/indexer/v1/task

curl -X ‘GET’ localhost:8081/druid/coordinator/v1/datasources/myapp_V1

{“tiers”:{"_default_tier":{“size”:87466414,“segmentCount”:2467}},“segments”:{“maxTime”:“2016-03-09T00:00:00.000Z”,“size”:87466414,“minTime”:“2015-11-01T00:00:00.000Z”,“count”:2467}}

``

What are my options?

Also the following command also didn’t seem address what I am looking for.

curl -v -X ‘DELETE’ “localhost:8081/druid/coordinator/v1/datasources/myapp_V1?kill=true&interval=2015-01-01T00:00:00Z/2016-03-11T00:00:00Z”

``

I believe you can disable the datasource, delete the segments for a certain time range, then enable the datasource again (using curl -X ‘POST’ localhost:8081/druid/coordinator/v1/datasources/myapp_V1).
If you kill just some of the segments, the druid_segments table should not become empty - just the rows for the segments you kill should go away.

Does the following look good?

Disable Datasource

curl -v -X ‘DELETE’ “localhost:8081/druid/coordinator/v1/datasources/myapp_V1”

Delete Segments Option 1 from taken from http://druid.io/docs/latest/design/coordinator.html

curl -v -X ‘DELETE’ “localhost:8081/druid/coordinator/v1/datasources/myapp_V1?kill=true&interval=2015-01-01T00:00:00Z/2016-03-11T00:00:00Z”

Delete Segments Option 2 from taken from http://druid.io/docs/latest/misc/tasks.html

curl -X ‘POST’ -H ‘Content-Type:application/json’ -d “{ “type”:“kill”, “id”:“kill_task-myapp_V1-date --iso-8601=seconds”,“dataSource”:“myapp_V1”, “interval”:“2015-01-01T00:00:00Z/2016-03-11T00:00:00Z” }” localhost:8090/druid/indexer/v1/task

Enable Datasource

curl -X ‘POST’ “localhost:8081/druid/coordinator/v1/datasources/myapp_V1”

``

Noticed that the url to disable a datasource is changing in future versions: https://github.com/druid-io/druid/blob/master/docs/content/design/coordinator.md.

Might you be getting caught by the & character? This command works for me if I use ’ instead of " around the URL.

Yes, I think so, except for needing ’ instead of " around the DELETE…kill.

It’s the url for deleting a segment using DELETE that’s changing. The reason for the change is to make things more consistent, but it will have the beneficial affect of taking away the problem with the ‘&’ character.

Hi Mark ,

I tried the below mentioned solution to delete particular time interval data but its not working .In actual data is not getting deleted .Here i am providing you the same commands which i tried :

curl -v -X ‘DELETE’ “http://localhost:8081/druid/coordinator/v1/TestData/Reach Kp It Est”

curl -v -X ‘DELETE’ “http://localhost:8081/druid/coordinator/v1/TestData/Reach Kp It Est?kill=true&interval=2017-07-20T05:21:00Z/2017-07-20T05:26:00Z”

Please can you help me.

I want to delete particular rows.

Hi Sayali,

Kill task deletes only those segments for specific datasource which are having used=0 in your metadata store (used=0 means not getting used or delete as per configured rule)

There are 2 ways if you can make segments marking with used=0 (1st option will mark all the segments for specific datasource as 0)

  1. you can disable the datasource directly from coordinator UI (http://:8081/)

(click on the datasource tab -> click on the specific datasource --> you will see disable button there http://:8081/#/datasources/<datasource_name)

  1. If you have configured MySQL as metadata store - you can go to druid_segments table and update the column used as 0 for all the segments which you need to delete).

Once you have done one of the above step - fire kill task as below -

Ø Fire below
request on http://<Overload_IP>:/druid/indexer/v1/task

{

“type”:
“kill”,

“id”: 15,

“dataSource”:
“name of the datasource”,

“interval” : “2016-12-01/2017-01-08”
<will delete all the segments of this duration which are unused, used==0>

}

If you delete all the segments for specific datasource then you will not see that datasource appearing on your coordinator console.

Regards,

Arpan Khagram

+91 8308993200

Thank you Arpan for your reply.

But my Question is different .

I just want to delete some specific rows from my datasource.

Can you please explain me the procedure to do same.

Thanks,

Sayali

Hi Sayali - Don’t think you can do that with any NoSQL database. Whatever segments which gets created are immutable.