stopGracefully() resulting in kerberos authentication errors after hadoop index task failure

Druid version: 0.15.0
What is happening:

  • When a hadoop batch task fails, the task goes into HadooopIndexTask.stopGracefully() in order to clean up resources on the hadoop cluster. During this method, I’m seeing Kerberos errors. Specifically: “Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)”. We have no issues with Kerberos during other parts of Druid’s interaction with hadoop (deepstore + starting the task, etc.)

Has anyone else dealt with this problem before?

There are a couple reasons that triggers this kind of errors. It could be that the lease on the krb5 token expired while in the middle of gracefully shutting down the tasks, permission on where tgt is stored has incorrect permission, etc.

Rommel Garcia