Unable to kill Waiting Task; Resulted in stopping of all subsequent realtime tasks

I’m using Tranquility to stream events into Druid. It has an hourly granularity. All of a sudden, now I see a task in “Waiting Tasks - Tasks waiting on locks” section. It has been there for a few hours now and has blocked every hourly ingestion task since.

Running, http://localhost:8090/druid/indexer/v1/task/index_realtime_ds_2018-05-25T06:00:00.000Z_0_0/status

gives status as running and duration as -1.

Doing, curl -XPOST http://localhost:8090/druid/indexer/v1/task/index_realtime_ds_2018-05-25T06:00:00.000Z_0_0/shutdown

seems to take forever, and then returns back, {“task”: “index_realtime_ds_2018-05-25T06:00:00.000Z_0_0”}

Using the Overlord Console is to no avail since using the kill from there returns “Kill request failed with status: 0 please check overlord logs.”

I even tried shutting down the firehose by executing

curl -XPOST http://0.0.0.0:8100/druid/worker/v1/chat/firehose:druid:overlord:ds-006-0000-0000/shutdown

It returns, connection refused, meaning the firehose has already shut down.

There are no exceptions in the logs. However, in the Tranquility log,

c.m.t.server.http.TranquilityServlet - Server error serving request to http://172.30.1.234:8200/v1/post/ds

java.lang.IllegalStateException: Failed to create merged beam: druid:overlord/ds

``

Caused by: com.twitter.finagle.GlobalRequestTimeoutException: exceeded 1.minutes+30.seconds to disco!druid:overlord while waiting for a response for the request, including retries (if applicable)

``

ERROR c.m.tranquility.beam.ClusteredBeam - Failed to update cluster state: druid:overlord/ds

com.twitter.finagle.GlobalRequestTimeoutException: exceeded 1.minutes+30.seconds to disco!druid:overlord while waiting for a response for the request, including retries (if applicable)

at com.twitter.finagle.NoStacktrace(Unknown Source) ~[na:na]

2018-05-25 11:34:33,654 [ClusteredBeam-ZkFuturePool-8176ebe2-2cea-4040-b050-efbb8da56ff0] WARN c.m.tranquility.beam.ClusteredBeam - Emitting alert: [anomaly] Failed to create merged beam: druid:overlord/ds

``

I believe I’m getting each of those errors when it fails to create a new task for realtime ingestion for every hour.

How should I free the task waiting on lock?

I finally solved this issue.
I had started Druid as a superuser. This made the lock file to be owned by the root.

After manually deleting the lock file, and starting Druid as a normal user, everything was fine.