Interesting, I haven’t heard of the same issue. Can you always reproduce the issue?
The parallel task does not talk to its subtasks directly to check their status. Instead, it talks to the Overlord periodically (1 sec polling period by default). So, if you saw task status check being slow, then it could mean calling the Overlord API took longer than usual. I guess this might happen in these cases:
- The Overlord stopped for a while for some reason. The parallel task will wait for the Overlord to come back indefinitely.
- The Overlord took a long time to process the API call. This is uncommon especially when it processed the previous calls quickly, but may be possible. The parallel task will wait for up to 15 mins by default until it times out. If this is what happened, you will be able to see something interesting, e.g., heavy GCs, in the overlord logs.
- The HTTP request for the API call failed. In this case, the parallel task will try to resend the request up to 5 times. If this is what happened, you will be able to see the retry logs, something like
Request[http://${overlord}/druid/indexer/v1/task/${taskId}/status] failed.
, in the parallel task logs.
In my experience, the first or the third cases are most popular.