Problem killing the old tasks

I am using following command to kill tasks:
curl -X 'DELETE ’ -H “Content-Type:application/json” -d @killTask.json http://10.137.209.146:8080/druid/indexer/v1/task/

Following is the kill task:

{

"type": "kill",

"id":"index_realtime_C1_2016-06-26T03:00:00.000Z_16_0,

"dataSource": "C1",

"interval" : "2016-06-25T20:00:00.000-07:00/2016-06-25T21:00:00.000-07:00"

}

Certain old tasks (around 10) of few hours ago got piled up and i wanted to kill using the above command. I am sending the kill request to overlord. But the task is not getting deleted.

Also i tried below command:

curl -XDELETE http://10.137.209.149:8080/druid/coordinator/v1/datasources/{C1}?kill=true&interval={2016-06-25T20:00:00.000-07:00/2016-06-25T21:00:00.000-07:00}

I am not getting any error but the tasks are not getting deleted.

I am using latest tranquility

Thanks

Bhaskar

Are you sending this task to tranquility or the overlord?

Also, a kill task is used to permanently delete data, are you trying to cancel the task?

I am sending this kill task to overlord. Yes i want to permanently delete the data. Reason is, sometimes those tasks are waiting for handover for more than 3-4 hours without any luck. They are indirectly causing the real time to slowdown. Thats why just want to eliminate them after waiting for certain period. The above command was not working. Is there any problem of using it? Please correct me in case i am wrong
Currently i waited over a day but tasks are still waiting, finally i need to eliminate the middle manager processes. Overall i waited over a day, it impacted the realtime badly. Always around 20 tasks waiting in queue every hour until the time old hour tasks are handed over. This is creating delay in processing the metrics.

Please help me out how to eliminate only tasks using “kill” rather removing middle manager

Thanks

Bhaskar

Hello Druid

Any update on my query!!!

Thanks

Bhaskar

Hey Bhaskar,

The command you’re using is for killing segments, not for shutting down indexing tasks. To do that, you’ll want to send the following to the overlord:

HTTP POST: http://<OVERLORD_IP>:<port>/druid/indexer/v1/task/{taskId}/shutdown

(more details here: http://druid.io/docs/latest/design/indexing-service.html)

Why are your tasks not handing off segments? If your task queues are always backing up and not able to hand off segments fast enough, you might need to increase the resources in your cluster.

Hello,
I tried doing this. Whole data source got deleted. I thought it is regarding shutdown of indexing tasks so that workers are freed up. Did I do anything wrong? I am very new to Druid. Please explain.

Hey Shantanu,

Welcome to Druid! You’ll have to provide more information on your setup so we can understand what happened, but in general, the task shutdown API talked about in this thread is for exceptional circumstances (e.g. task runs out of memory and gets stuck, stops responding to commands from the overlord, etc.) and isn’t needed in a normal situation. Is there a particular reason you had to shut down the indexing task before it completed?

Hello,
We are running single machine setup. Yes, tasks were hung up. Data ingestion was complete yet they were reporting their status as running.

Can you post your indexing task and overlord logs? You can get the indexing task log by going to the overlord console, by default http://{OVERLORD_IP}:8090

Hey,
I have attached indexing task log

druid-indexing-task.log (7.64 MB)

Ah, the problem is your windowPeriod is too high so the indexing task won’t complete in a reasonable amount of time. Indexing tasks only push their data to deep storage when the task completes (and if it doesn’t get pushed to deep storage then it’s not queryable when the task terminates).

Tranquility-based tasks run for segmentGranularity + windowPeriod before starting to publish their segments. Try setting your windowPeriod to something less than segmentGranularity, a value like PT15M is more reasonable. You should also change your intermediatePersistPeriod to something less than segmentGranularity.

Hey,
Thanks a lot. I will try this.

Hi, I changed tranquility kafka config and reduced windowPeriod and then restarted the entire service but those tasks are still there. If I kill those tasks my data will be gone too. What should I do now?

Hey Shantanu,

You can signal a clean shutdown by using the endpoint:

POST http://:/druid/worker/v1/chat//shutdown

This should cause the events already ingested to be published. See here for more details: http://druid.io/docs/0.9.1.1/ingestion/firehose.html

By the way, if you haven’t already looked at the Kafka indexing service, I recommend you take a look to see if will work for your use case. The Kafka indexing service has a number of useful properties such as exactly-once ingestion guarantees and the ability to ingest historical as well as realtime data. See: http://druid.io/docs/0.9.1.1/development/extensions-core/kafka-ingestion.html

Thanks a lot for that piece.
Ok, it’s been some time since I am using much shorter windowPeriod setting. I keep it to just 20 or 30 minutes. But still, even many hours after data has been completely consumed, indexing tasks are not shutting down. What may be going wrong? I have checked task log and tranquility-kafka log. Nothing irregular there.

It may be because of the out of memory exceptions that you mentioned in the other thread. Otherwise, post your logs and we can see what’s going on there.

Hey David,
I checked indexing logs. Didn’t see any memory issues. I have attached log file here, please check it out.

A single task has been running for more than 24 hours. Hence log is very big so I have uploaded it on G-drive. Here’s the link

https://drive.google.com/drive/u/0/folders/0B6aHJAUks3NNb3JMUG1WU1h3QWs

What is eventReceiverServiceName? I am not able to understand this. The firehose doc you provided is probably not relevant for me. I am loading data using Tranquility.

Hey Shantanu,

I don’t have permissions to view that Google Drive folder.

Regarding the eventReceiverServiceName: the way that Tranquility works is it creates realtime indexing tasks in the indexing service which read their data from an EventReceiverFirehose (where the EventReceiverFirehose listens for events pushed to an HTTP endpoint from Tranquility). One way to determine the eventReceiverServiceName is to look in your indexing task logs for entries like these:

EventReceiverFirehoseFactory - Connecting firehose: firehose:test:overlord:github-016-0000-0000

ServiceAnnouncingChatHandlerProvider - Registering Eventhandler[firehose:test:overlord:github-016-0000-0000]

CuratorServiceAnnouncer - Announcing service[DruidNode{serviceName=‘firehose:test:overlord:github-016-0000-0000’, host=‘ip’, port=8101}]