Druid logs -missing failed task logs?

Running on imply 3.0.3

Hi all, trying to load some data on of our clusters, but we are getting load failures on this cluster:

looking at the logs (on the MIddleManager)

2019-08-13T19:05:29,746 INFO [forking-task-runner-1] org.apache.druid.indexing.overlord.TaskRunnerUtils - Task [index_parallel_splitab-test_2019-08-13T19:05:29.731Z] status changed to [RUNNING].
2019-08-13T19:05:29,746 INFO [forking-task-runner-1] org.apache.druid.indexing.overlord.ForkingTaskRunner - Logging task index_parallel_splitab-test_2019-08-13T19:05:29.731Z output to: /opt/data/druid/task/index_parallel_splitab-test_2019-08-13T19:05:29.731Z/log
2019-08-13T19:06:46,023 INFO [WorkerTaskManager-CompletedTasksCleaner] org.apache.druid.indexing.worker.WorkerTaskManager - Deleting completed task[index_parallel_splitab_2019-08-13T19:01:40.210Z] information, overlord task status[FAILED].
2019-08-13T19:07:54,432 INFO [forking-task-runner-1-[index_parallel_splitab-test_2019-08-13T19:05:29.731Z]] org.apache.druid.indexing.overlord.ForkingTaskRunner - Process exited with status[0] for task: index_parallel_splitab-test_2019-08-13T19:05:29.731Z
2019-08-13T19:07:54,432 INFO [forking-task-runner-1] org.apache.druid.indexing.common.tasklogs.FileTaskLogs - Wrote task log to: log/index_parallel_splitab-test_2019-08-13T19:05:29.731Z.log
2019-08-13T19:07:54,432 INFO [forking-task-runner-1] org.apache.druid.indexing.common.tasklogs.FileTaskLogs - Wrote task report to: log/index_parallel_splitab-test_2019-08-13T19:05:29.731Z.report.json
2019-08-13T19:07:54,433 INFO [forking-task-runner-1] org.apache.druid.indexing.overlord.TaskRunnerUtils - Task [index_parallel_splitab-test_2019-08-13T19:05:29.731Z] status changed to [FAILED].
2019-08-13T19:07:54,433 INFO [forking-task-runner-1] org.apache.druid.indexing.overlord.ForkingTaskRunner - Removing task directory: /opt/data/druid/task/index_parallel_splitab-test_2019-08-13T19:05:29.731Z
2019-08-13T19:07:54,519 INFO [WorkerTaskManager-NoticeHandler] org.apache.druid.indexing.worker.WorkerTaskManager - Job’s finished. Completed [index_parallel_splitab-test_2019-08-13T19:05:29.731Z] with status [FAILED]

``

What has me puzzled, I can’t find the log? the UI also cannot find the log for this task? making it very difficult to trouble shoot…

so my question is where is it actually trying to ‘log’ the task information to? looking in /opt/data/task/ I see other tasks but not this one? I see no ‘log’ directory under /opt/data, yet the log insists its logging to : log/index_parallel_splitab-test_2019-08-13T19:05:29.731Z.log

Here is what I believe is happening:

I have resubmitted the load process, this time, however while it was loading I went to /opt/data/tasks/

and saw the task directory, then I can go into that…and tail the log while its running…but once the task fails, the log vanishes!!!

note: under /opt/data/tasks/, I see: completeTasks, assignedTasks <-- should there also be a failedTasks?

Thanks

Dan

FYI: the task is failing due to permission issues writing to s3…

Does it by chance attempt to push the task log there?

Hi Dan, yes you are correct that when Druid has no permission writing the indexing log to the destination, which is also on s3 (?) , the indexing task log is lost when the task finishes (regardless succeeds or fails). Can you double check you common.runtime.properties ?

thanks

ok, just seems like a ‘druid-bug’, if the system ‘fails’ to push its logs to ‘s3’…then the logs shouldn’t …vanish…I would think it leave the logs in …failed_tasks directory or some such.