How to drop a datasource?

Hello,

I’m writing some integration tests to test integration (queries) between my app and druid.

I’d like to have this workflow:

  1. Send index_realtime request to the indexer to start accepting metrics

  2. Send few metrics

  3. Make few queries and assertions

  4. Delete the datasource

  5. Repeat

The first point wasn’t hard. I just sent the Realtime Index Task command and it worked. 2-3 were also easy. 4th is the one I’m having trouble with. I tried to send DELETE coordinator/v1/datasource/:datasource, but it says that the given datasource does not exist. I’m unable to delete the source, data and even stop the firehose for that datasource.

Any suggestions?

Thanks,

Indrek

Hi,

  1. You would have to to stop the firehose explicitly in this case by

POST http://<OVERLORD_IP>:<port>/druid/indexer/v1/task/{taskId}/shutdown

`(Ref: http://druid.io/docs/latest/design/indexing-service.html )

`

2) Then use

DELETE ``http://<coordinator_ip>:<port>/druid/coordinator/v1/datasources/{dataSourceName}?kill=true&interval=1970/3000

(Ref: [http://druid.io/docs/latest/design/coordinator.html](http://druid.io/docs/latest/design/coordinator.html) ) to wipe out the dataSource.

`Note that this step will say “dataSource does not exist” if there no segments have been “handed off” from the realtime node.

`

-- Himanshu
`

`

Hi,

This works but it is really slow to shutdown/create tasks all the time. Is it possible to clear all the data from a realtime task instead?

Thanks,

Indrek

Hi Indrek, are you asking to drop all the data from the realtime task and reuse the same task? Currently there’s no way of doing that, you will have to shutdown the task as Himanshu suggested.

Yes, that was exactly what I was asking. Thank you.

Do you think it would be hard to implement and would you like this feature? If I have some free time I might be look into it.

Indrek

Hi Indrek, can you clarify how slow your tasks are shutting down? If you issue a task shutdown command, the task should stop immediately.

I guess the problem is more about starting it again. Shutdown+Start takes around ~10-20 seconds. I have around ~100 tests at the moment (~10 per data source). If I want to start each test from a clean slate then it means that it will take ~10*100 seconds to run them. I’d rather start all the tasks and just clean the data in every run.

Ah, “flushing” a segment is not really supported right now and will require some code to be written. You might be able to get away with a new type of plumber that is only used in tests and you can programmatically have it do things you’d never want to do in production. Have you looked into why the shutdown + start is taking as long as it is? I’m a little surprised at those times, especially for low volumes of data.