Trouble running kill tasks

Helle everyone,

I’m currently setting up a test data source which is the exact copy of our production data source, the only difference being very low retention ( to let us play with specs and other parameters without breaking production)

We are using druid-0.9.1.1, with a single node, filesystem for deep-storage, postgres for metadata, and an indexing service after moving away from real-time.

Ive managed to set up proper rules which are as follow:

[

{

“period”: “PT3H”,

“tieredReplicants”: {

“_default_tier”: 1

},

“type”: “loadByPeriod”

},

{

“type”: “dropForever”

}

]

My segments are dropped from cache like it should but I’m now trying to automate a kill task to remove unused segments from deep storage.

I’ve noticed that the endpoint changed with 0.9.1 to dr**uid/coordinator/v1/datasources/{dataSourceName}/intervals/{interval}?kill=true
but I couldn’t make this work as the documentation http://druid.io/docs/latest/ingestion/tasks.html is propably out of date?
Anyway I’ve tried to post/get with various intervals value, segment identifiers in url or in the post body (from my research in the dev google groups). I’ve had a lot of 200 OK but I’ve never seen my task being picked up in the overlord logs, and indeed my segments are always in the deep storage.

Oh and the “used” column is correctly set to false for all the “dropped” segments in the druid-segments db.

So my questions are :

  • Whats the proper post body to generate a kill tasks ?
  • Can I set up dynamic intervals in the kill task: something like “from 1 month ago TO 3 hours ago” (same as the drop rule) or do I have to generate proper specific intervals (with actual date) myself ?

Let me know if you need more info/details.

Thanks in advance, all suggestions are welcome!

Charles

Hello,

There are two ways to nuke disabled segments from deep storage (assumption is that segments are already marked unused in metadata store) -

  1. POST a Kill task (described here - http://druid.io/docs/latest/ingestion/tasks.html) to “Overlord” at <overlord_host>:/druid/indexer/v1/task

  2. Use one of the HTTP endpoints at "Coordinator" -

a. This is one is marked as deprecated but works - call “DELETE” method on “<coordinator_host>:/druid/coordinator/v1/datasources/<datasource_name>?kill=true&interval=”, should be a valid ISO-8601 interval.

b. call “DELETE” method on “<coordinator_host>:/druid/coordinator/v1/datasources/<datasource_name>/intervals/”, one thing to note here for this endpoint is that should not have “/”, you have to replace “/” with “_”

For automatic clean up of unused segments see “druid.coordinator.kill.*” properties here - http://druid.io/docs/0.9.1.1/configuration/coordinator.html

  • Parag

Hey Parag,

I just ran a Kill Task

curl -X DELETE http://foo:8081/druid/coordinator/v1/datasources/bar/intervals/2016-09-07T00:00:00.000_2016-09-28T00:00:00.000

and I could see that it was accepted and finished successfully from Coordinator’s logs, but Coordinator console is still listing these segments.

Even after 10 minutes which is longer than all the Coordinator’s lifecycle periods.

I’m being suspicious that this Kill Task should be ran after Disabling a Segment, because the Kill Task seems to have no effect at all :

  • /druid/coordinator/v1/datasources/{dataSourceName}/segments/{segmentId}

I’m on 0.9.1.1 druid, any idea what I’m doing wrong ?

I see,

Kill tasks delete all information about a segment and removes it from deep storage. Killable segments must be disabled (used==0) in the Druid segment table.

I guess I’ll have to write a script for that to get the Segment Ids first and than disable them using that id.

Kill tasks only impact segments that have been disabled first.