Hi,
I have been using druid version 0.17.0. I am using a single host docker based system for test purposes. The system has been running fine for 2 weeks. It has been able to handle streaming and batch loads fine till very recently. But now all new tasks are ending up in PENDING status and no progress is being made. Both streaming and batch ingestion tasks are getting affected.
In overlord process logs I see following logs =>
2020-04-03T09:10:48,595 INFO [qtp890160784-67] org.apache.druid.indexing.overlord.MetadataTaskStorage - Inserting task index_parallel_XXXXX_pjglcghe_2020-04-03T09:10:48.594Z with status: TaskStatus{id=index_parallel_XXXXX_pjglcghe_2020-04-03T09:10:48.594Z, status=RUNNING, duration=-1, errorMsg=null}
2020-04-03T09:10:48,597 INFO [qtp890160784-67] org.apache.druid.indexing.overlord.TaskLockbox - Adding task[index_parallel_XXXXX_pjglcghe_2020-04-03T09:10:48.594Z] to activeTasks
2020-04-03T09:10:48,598 INFO [TaskQueue-Manager] org.apache.druid.indexing.overlord.TaskQueue - Asking taskRunner to run: index_parallel_XXXXX_pjglcghe_2020-04-03T09:10:48.594Z
2020-04-03T09:10:48,598 INFO [TaskQueue-Manager] org.apache.druid.indexing.overlord.RemoteTaskRunner - Added pending task index_parallel_XXXXX_pjglcghe_2020-04-03T09:10:48.594Z
2020-04-03T09:10:54,916 INFO [TaskQueue-Manager] org.apache.druid.indexing.overlord.RemoteTaskRunner - Assigned a task[index_parallel_XXXXX_pjglcghe_2020-04-03T09:10:48.594Z] that is already pending!
2020-04-03T09:11:54,916 INFO [TaskQueue-Manager] org.apache.druid.indexing.overlord.RemoteTaskRunner - Assigned a task[index_parallel_XXXXX_pjglcghe_2020-04-03T09:10:48.594Z] that is already pending!
This particular log - Assigned a task[index_parallel_XXXXX_pjglcghe_2020-04-03T09:10:48.594Z] that is already pending - keeps repeating in the log file but no progress is being made. There are 5 tasks in this state (3 streaming tasks and 2 batch ingestion tasks).
Any help on following questions is highly appreciated -
-
What may be causing this issue? The tasks are in pending state for more that 10 hours now. Any new tasks are ending up in a similar state.
-
How can I get the system back to the working state? Is it safe to restart the processes in this state? Which processes need to be restarted?
-
Is if safe to kill these tasks through REST API calls?
Thanks,
Shashi