Stopping and understanding merged tasks in a given stream

Hi Druid Users,
I have the following situation:
Druid nodes were working properly for some realtime space until my Kafka cluster crashed. Then one of the realtimes escalated in terms of cpu utilization. I managed to revive Kafka cluster, but what happened was following:

  • the affected realtime ceased to stabilize (cpu utilization has stayed on very high level)
  • a substantial number of tasks were produced in /tmp/persistent/task/ of overlord
  • those tasks only accumulated until I was forced to remove the oldest ones to have a server operative
  • I restarted the realtime, after removing all pertinent merged_* tasks
    But I ended up in the same situation of ever increasing, not really finalized, merged_* tasks. Pushing segment is only partial as a consequence

What is the recommended procedure to restart realtime, clean Druid in order to startover without abovementioned burden?
How to prop Druid to stop merging tasks for a given stream?
What is a complete procedure to clean after terminating a given stream? What should be done?

Thank you very much in advance for all hints and answers,
Best,
Pawel

Hey Pawel,

What realtime indexing method were you using?

Hi Gian,
So I delegated indexing to overlord:
druid.selectors.indexing.serviceName=overlord
and there in runtime.properties of overlord:
druid.indexer.fork.property.hadoop.mapred.job.queue.name=druid-indexing
druid.indexer.fork.property.hadoop.mapreduce.job.queuename=druid-indexing
I see that runtimes that happen to operate without stop deal with merged task properly, ie., there is something like 20-30 tasks always writen in /tmp/persistent/tasks directory.
But tasks which restarted, start to accumulate those tasks and cannot stop. My question is how to cleanup during operation (direct removing is not working of course). Maybe when I stop realtime, it is advisable to do something. Disable datasource and then enable via coordinator?

Best,
Pawel

Hi All,
To summarize I have problem with accumulation of merged_task in /tmp/persistent/task directory which leads quickly to disk overload.
This is problem for one of my datasources ONLY. So as time passes a number of directories that follow naming pattern merged_datasourceName_number_date are created and not repleted.
I queried overlord about the status of one of the tasks and and I received:
{“task”:“merge_datasourceName1_f93ac66f0ca1f6c4bb46b296139a63a9685a6216_2016-07-10T14:35:59.675Z”,“status”:{“id”:“merge_datasourceName1_f93ac66f0ca1f6c4bb46b296139a63a9685a6216_2016-07-10T14:35:59.675Z”,“status”:“FAILED”,“duration”:-1}}

I have the same problem also after restart of a given realtime (and cleaning the datasource).

My question:

  1. Where I can learn more about what is wrong? In overlord.log only? I have WARN level and I cannot see anything here in logs.
  2. What sending
**curl -X POST http://<OVERLORD_IP>:<port>/druid/indexer/v1/task/{taskId}/shutdown**
is going really to do in such a situation? Is it different to just brutal **rm -f**3. What is going to be result when I restart coordinator with :
druid.coordinator.merge.on=false
druid.coordinator.conversion.on=false
(no I have in both cases **true**)
It is going to stop crating merged tasks, right?
Thank very much in advance for all replies,
Best,
Pawel

Some more information to what is above:
Only one task out of > 300 happened to result that a corresponding directory is empty. This task has status:
{“task”:“merge_datasourceName1_bea0312c992a442c7de246544736633fe4ca84d6_2016-07-09T21:05:55.768Z”,“status”:{“id”:“merge_datasourceName1_bea0312c992a442c7de246544736633fe4ca84d6_2016-07-09T21:05:55.768Z”,“status”:“FAILED”,“duration”:13140}}
So this task logged duration > 0 but also failed. All other tasks, except this one, also ended with FAILED status but with duration = -1 and leaving nonempty directory.

So if one merged task fails the other for the same datasource also are expected to fail?
How to clean this up?

Best,
Pawel

Ok, I sorted out the problem.

The logs of a given task can be accessible through

**curl http://<OVERLORD_IP>:<port>/druid/indexer/v1/task/{taskId}/log**

The endpoint is not present in the documentation, but it is in a source code.

Best,
Pawel