Tranquility- ZK connection issue .

I’m having ZK connection issue with tranquility , here is some more details .

  • I’m running with 20 second spark streaming batc.
  • I don’t see anything posted to overload/middle manager or any log activity
  • I can see the datasource shard info at ZK
  • initially it starts with connection reset by peer error and eventually OME .
  • I’m able to connect ZK from spark node .
  • during same period of time I’m able to connect with nc .
  • log details below

I would really appreciate for any tips/inputs to look for .

java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_73]
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[?:1.8.0_73]
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:1.8.0_73]
at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_73]
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[?:1.8.0_73]
at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68) ~[zookeeper-3.4.8.jar:3.4.8–1]
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) ~[zookeeper-3.4.8.jar:3.4.8–1]
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141) [zookeeper-3.4.8.jar:3.4.8–1]
2016-07-18T14:16:49,718 WARN [Executor task launch worker-12-SendThread(172.31.13.55:2181)] org.apache.zookeeper.ClientCnxn - Session 0x0 for server 172.31.13.55/172.31.13.55:2181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_73]
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[?:1.8.0_73]
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:1.8.0_73]
at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_73]
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[?:1.8.0_73]
at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68) ~[zookeeper-3.4.8.jar:3.4.8–1]
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) ~[zookeeper-3.4.8.jar:3.4.8–1]
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141) [zookeeper-3.4.8.jar:3.4.8–1]

I have increased each of ZK node client max connection to 300 (maxClientCnxns=300) for 3 node zk cluster . However , I’m still seeing the same contention issue , now it survive little longer then previous …one thing I noticed the connection contention issue comes more often when there is data drop ( not matching with window period) or when it is running little behind …