Packet len is out of range

Hi,

I am seeing the following errors in my overlord logs:

2016-11-01T16:20:45,052 INFO [Curator-LeaderSelector-0-SendThread(xxxxxxx:2181)] org.apache.zookeeper.ClientCnxn - Opening socket connection to server xxxxxxx/xxxxxxx:2181. Will not attempt to authenticate using SASL (unknown error)

2016-11-01T16:20:45,052 INFO [Curator-LeaderSelector-0-SendThread(xxxxxxx:2181)] org.apache.zookeeper.ClientCnxn - Socket connection established to xxxxxxx/xxxxxxx:2181, initiating session

2016-11-01T16:20:45,053 WARN [Curator-LeaderSelector-0-SendThread(xxxxxxx:2181)] org.apache.zookeeper.ClientCnxn - Session 0x0 for server xxxxxxx/xxxxxxx:2181, unexpected error, closing socket connection and attempting reconnect

java.io.IOException: Packet len1213486160 is out of range!

at org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:112) ~[zookeeper-3.4.8.jar:3.4.8–1]

at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:79) ~[zookeeper-3.4.8.jar:3.4.8–1]

at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) ~[zookeeper-3.4.8.jar:3.4.8–1]

at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141) [zookeeper-3.4.8.jar:3.4.8–1]

``

I have read that this error is thrown when the response from Zookeeper is too large. However, the only thing I am running at the moment is Zookeeper and Overlord in two different Docker containers on the same host.

Any clues?

I would've told you to adjust your jute.maxbuffer size (I think that's
the name of the property), but given that you are only running those
two processes, I don't think that's it. Also, the fact that the len
is 1213486160 is pretty crazy. From the little I know about when I've
seen that error legitimately, the protocol that jute uses appear to
start with a "length" parameter that tells jute how many bytes are in
the message, that number is compared to the "max buffer" size and if
it is too large, it ends up printed out in the log line you see there.

So that means that whatever the ZK client is trying to communicate
with is sending back something that has 1213486160 in the place where
the "length" parameter normally would be for jute. My guess is that
the ports might be messed up and the communication is actually talking
to something not zookeeper or there's something causing some sort of
data corruption on the network transmission.

--Eric

Eric that’s a great guess.

1213486160 in network order is 0x48545450, or… “HTTP”. That’s a web server, probably responding “HTTP/1.1 400 Bad Request” to whatever the ZK client said.

Thanks for the replies! That confirms the hunch that I had.

However, the plot thickens:

2016-11-02T10:29:50,851 INFO [main] org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=internal-xxx.elb.amazonaws.com sessionTimeout=30000 watcher=org.apache.curator.ConnectionState@1ba359bd

2016-11-02T10:29:50,901 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void io.druid.metadata.SQLMetadataSupervisorManager.start()] on object[io.druid.metadata.SQLMetadataSupervisorManager@1e469dfd].

2016-11-02T10:29:50,907 INFO [main] io.druid.metadata.SQLMetadataConnector - Table[druid_supervisors] already exists

2016-11-02T10:29:50,908 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void io.druid.indexing.overlord.TaskMaster.start()] on object[io.druid.indexing.overlord.TaskMaster@2d9dff65].

2016-11-02T10:29:50,914 INFO [main] org.eclipse.jetty.server.Server - jetty-9.2.5.v20141112

2016-11-02T10:29:50,919 INFO [main-SendThread(ip-xxx-xx-108-104.xxx.internal:2181)] org.apache.zookeeper.ClientCnxn - Opening socket connection to server ip-xxx-xx-xxx-104.xxx.internal/xxx.xx.xxx.104:2181. Will not attempt to authenticate using SASL (unknown error)

2016-11-02T10:29:51,128 INFO [main-SendThread(ip-xxx-xx-108-104.xxx.internal:2181)] org.apache.zookeeper.ClientCnxn - Socket connection established to ip-xxx-xx-xxx-104.xxx.internal/xxx.xx.xxx.104:2181, initiating session

2016-11-02T10:29:51,134 WARN [main-SendThread(ip-xxx-xx-108-104.xxx.internal:2181)] org.apache.zookeeper.ClientCnxn - Session 0x0 for server ip-xxx-xx-xxx-104.xxx.internal/xxx.xx.xxx.104:2181, unexpected error, closing socket connection and attempting reconnect

java.io.IOException: Packet len1213486160 is out of range!

at org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:112) ~[zookeeper-3.4.8.jar:3.4.8–1]

at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:79) ~[zookeeper-3.4.8.jar:3.4.8–1]

at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) ~[zookeeper-3.4.8.jar:3.4.8–1]

at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141) [zookeeper-3.4.8.jar:3.4.8–1]

2016-11-02T10:29:51,419 INFO [main-SendThread(ip-xxx-xx-121-200.xxx.internal:2181)] org.apache.zookeeper.ClientCnxn - Opening socket connection to server ip-xxx-xx-xxx-200.xxx.internal/xxx.xx.xxx.200:2181. Will not attempt to authenticate using SASL (unknown error)

2016-11-02T10:29:51,422 INFO [main-SendThread(ip-xxx-xx-121-200.xxx.internal:2181)] org.apache.zookeeper.ClientCnxn - Socket connection established to ip-xxx-xx-xxx-200.xxx.internal/xxx.xx.xxx.200:2181, initiating session

2016-11-02T10:29:51,427 WARN [main-SendThread(ip-xxx-xx-121-200.xxx.internal:2181)] org.apache.zookeeper.ClientCnxn - Session 0x0 for server ip-xxx-xx-xxx-200.xxx.internal/xxx.xx.xxx.200:2181, unexpected error, closing socket connection and attempting reconnect

java.io.IOException: Packet len1213486160 is out of range!

at org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:112) ~[zookeeper-3.4.8.jar:3.4.8–1]

at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:79) ~[zookeeper-3.4.8.jar:3.4.8–1]

at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) ~[zookeeper-3.4.8.jar:3.4.8–1]

at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141) [zookeeper-3.4.8.jar:3.4.8–1]

``

I am running Zookeeper behind and Elastic (application) Load Balancer. You can see Zookeeper connecting to the correct druid.zk.service.host: internal-xxx.elb.amazonaws.com. After that it goes on to connect to two hosts: ip-xxx-xx-xxx-200.xxx.internal and ip-xxx-xx-xxx-104.xxx.internal. However, these two hosts are completely unfamiliar. I have gone through all resources in the AWS account, but those hosts are not associated with anything. I have destroyed and restarted all related services, but still, those two hosts keep popping up.

Now my question is: Where would Zookeeper generally get those addresses from? I am quite new to Zookeeper (picking up a book this afternoon), and tried going through the docs, but I can’t seem to figure it out.

Thanks in advance!

I bet those mystery hosts are the internal addresses of the ELB you set up. ZK may be resolving the ELB hostname to IPs (which may be be Amazon IPs, not your instances) and then talking to those directly.

Gian

Were you able to find out the solution for this situation, where Zookeeper nodes are under ELB.

Hi Sahil,

It’s best to avoid load balancers when using ZK. The ZK protocol relies on the client doing its own client-side load balancing, and so the client expects to get direct access to each ZK server individually.