SASL authentication exception if druid.zk.service.host set to zookeeper's ip address

In my druid’s setup, zookeeper is hosted on hbs01.tpl-hadoop-wh.com, its ip address is 10.21.9.138.

in common.runtime.properties, for the property of “druid.zk.service.host”, if it’s set to ip address(10.21.9.138), it errors as below, somehow if it is set to dns name(hbs01.tpl-hadoop-wh.com), it works fine, I spent days to figure out this bit of tricky issue, does anyone know why please? thanks a lot for sharing!

btw, i used kerberos v5 authentication, and seeing from the log, kerberos login was successful, so should be irrelevant here.

2018-11-09T07:20:06,529 INFO [main-SendThread(localhost:2181)] org.apache.zookeeper.Login - Client successfully logged in.

2018-11-09T07:20:06,531 INFO [Thread-51] org.apache.zookeeper.Login - TGT refresh thread started.

2018-11-09T07:20:06,535 INFO [Thread-51] org.apache.zookeeper.Login - TGT valid starting at: Fri Nov 09 07:19:31 UTC 2018

2018-11-09T07:20:06,535 INFO [Thread-51] org.apache.zookeeper.Login - TGT expires: Fri Nov 09 19:19:31 UTC 2018

2018-11-09T07:20:06,536 INFO [Thread-51] org.apache.zookeeper.Login - TGT refresh sleeping until: Fri Nov 09 17:09:21 UTC 2018

2018-11-09T07:20:06,537 INFO [main-SendThread(localhost:2181)] org.apache.zookeeper.client.ZooKeeperSaslClient - Client will use GSSAPI as SASL mechanism.

2018-11-09T07:20:06,543 INFO [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn - Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will attempt to SASL-authenticate using Login Context section ‘Client’

2018-11-09T07:20:06,564 INFO [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn - Socket connection established to localhost/0:0:0:0:0:0:0:1:2181, initiating session

2018-11-09T07:20:06,575 INFO [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn - Session establishment complete on server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x10003a2a6fa000b, negotiated timeout = 30000

2018-11-09T07:20:06,583 INFO [main-EventThread] org.apache.curator.framework.state.ConnectionStateManager - State change: CONNECTED

2018-11-09T07:20:06,618 ERROR [main-SendThread(localhost:2181)] org.apache.zookeeper.client.ZooKeeperSaslClient - An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - LOOKING_UP_SERVER)]) occurred when evaluating Zookeeper Quorum Member’s received SASL token. Zookeeper Client will go to AUTH_FAILED state.

2018-11-09T07:20:06,618 ERROR [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn - SASL authentication with Zookeeper Quorum member failed: javax.security.sasl.SaslException: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - LOOKING_UP_SERVER)]) occurred when evaluating Zookeeper Quorum Member’s received SASL token. Zookeeper Client will go to AUTH_FAILED state.

2018-11-09T07:20:06,618 ERROR [main-EventThread] org.apache.curator.ConnectionState - Authentication failed

btw, i used kerberos v5 authentication, and seeing from the log, kerberos login was successful, so should be irrelevant here.

I’m not encountered this error before, but based on a google search for that error, it seems like the client is able to get a ticket but there appears to be a mismatch somewhere with your DNS resolution for hbs01.tpl-hadoop-wh.com, the SPN used by Zookeeper in your kerberos database, and the IP address 10.21.9.138.

Thanks,

Jon