Re: [druid-user] Native Batch Ingest sub-task completion status not updating

Interesting, I haven’t heard of the same issue. Can you always reproduce the issue?

The parallel task does not talk to its subtasks directly to check their status. Instead, it talks to the Overlord periodically (1 sec polling period by default). So, if you saw task status check being slow, then it could mean calling the Overlord API took longer than usual. I guess this might happen in these cases:

  • The Overlord stopped for a while for some reason. The parallel task will wait for the Overlord to come back indefinitely.
  • The Overlord took a long time to process the API call. This is uncommon especially when it processed the previous calls quickly, but may be possible. The parallel task will wait for up to 15 mins by default until it times out. If this is what happened, you will be able to see something interesting, e.g., heavy GCs, in the overlord logs.
  • The HTTP request for the API call failed. In this case, the parallel task will try to resend the request up to 5 times. If this is what happened, you will be able to see the retry logs, something like Request[http://${overlord}/druid/indexer/v1/task/${taskId}/status] failed., in the parallel task logs.

In my experience, the first or the third cases are most popular.

Hi Franklyn,

The forking task runner monitors the forked Java process that runs the
task. The task status update should be propagated to the forking task
runner as soon as the task process exits.
Can you always reproduce this issue? If so, can you check what the
forking task runner and the forked Java process were doing?
The thread name of the forking task runner should end with the task ID.
The forked Java process should be running a java command starting with
'Main internal peon'.
You may want to capture flame graphs of those processes.

explains how to capture them.