Realtime indexing peons failing to bind to port

Hye all, I wanted to see if this was a known issue. We have been running into it occasionally from our realtime index tasks and when combined with tranquilities current behavior it results in a realtime indexing deadzone until tranq starts up the next segment.

From what I’ve seen I’ve ruled out the following:

  • Other processes binding to that port between middleManager determining its free & peon starting

  • Multiple middleManagers running on the same host

  • Multiple tasks being assigned the same port

All of our hosts are running Ubuntu 14.04.2 LTS and Druid 0.7.3. I found some things that might be able to mitigate it, such as https://github.com/druid-io/tranquility/issues/9 as well as possibly somebody else that ran into it https://github.com/druid-io/druid/issues/3876

2017-03-02T20:59:07,747 INFO [main] org.eclipse.jetty.server.handler.ContextHandler - Started o.e.j.s.ServletContextHandler@5be61bcd{/,null,AVAILABLE}

2017-03-02T20:59:07,749 WARN [main] org.eclipse.jetty.util.component.AbstractLifeCycle - FAILED ServerConnector@2f02744{HTTP/1.1}{0.0.0.0:43785}: java.net.BindException: Address already in use

java.net.BindException: Address already in use

at sun.nio.ch.Net.bind0(Native Method) ~[?:1.7.0_75]

at sun.nio.ch.Net.bind(Net.java:444) ~[?:1.7.0_75]

at sun.nio.ch.Net.bind(Net.java:436) ~[?:1.7.0_75]

at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) ~[?:1.7.0_75]

at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) ~[?:1.7.0_75]

at org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:321) ~[druid-standalone.jar:0.8.3.5]

at org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:80) ~[druid-standalone.jar:0.8.3.5]

at org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:236) ~[druid-standalone.jar:0.8.3.5]

at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68) [druid-standalone.jar:0.8.3.5]

at org.eclipse.jetty.server.Server.doStart(Server.java:366) [druid-standalone.jar:0.8.3.5]

at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68) [druid-standalone.jar:0.8.3.5]

at io.druid.server.initialization.jetty.JettyServerModule$1.start(JettyServerModule.java:167) [druid-standalone.jar:0.8.3.5]

at com.metamx.common.lifecycle.Lifecycle.start(Lifecycle.java:244) [druid-standalone.jar:0.8.3.5]

at io.druid.guice.LifecycleModule$2.start(LifecycleModule.java:155) [druid-standalone.jar:0.8.3.5]

at io.druid.cli.GuiceRunnable.initLifecycle(GuiceRunnable.java:71) [druid-standalone.jar:0.8.3.5]

at io.druid.cli.CliPeon.run(CliPeon.java:232) [druid-standalone.jar:0.8.3.5]

at io.druid.cli.Main.main(Main.java:99) [druid-standalone.jar:0.8.3.5]

2017-03-02T20:59:07,765 WARN [main] org.eclipse.jetty.util.component.AbstractLifeCycle - FAILED org.eclipse.jetty.server.Server@5f1ec010: java.net.BindException: Address already in use

java.net.BindException: Address already in use

at sun.nio.ch.Net.bind0(Native Method) ~[?:1.7.0_75]

at sun.nio.ch.Net.bind(Net.java:444) ~[?:1.7.0_75]

at sun.nio.ch.Net.bind(Net.java:436) ~[?:1.7.0_75]

at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) ~[?:1.7.0_75]

at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) ~[?:1.7.0_75]

at org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:321) ~[druid-standalone.jar:0.8.3.5]

at org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:80) ~[druid-standalone.jar:0.8.3.5]

at org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:236) ~[druid-standalone.jar:0.8.3.5]

at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68) [druid-standalone.jar:0.8.3.5]

at org.eclipse.jetty.server.Server.doStart(Server.java:366) ~[druid-standalone.jar:0.8.3.5]

at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68) [druid-standalone.jar:0.8.3.5]

at io.druid.server.initialization.jetty.JettyServerModule$1.start(JettyServerModule.java:167) [druid-standalone.jar:0.8.3.5]

at com.metamx.common.lifecycle.Lifecycle.start(Lifecycle.java:244) [druid-standalone.jar:0.8.3.5]

at io.druid.guice.LifecycleModule$2.start(LifecycleModule.java:155) [druid-standalone.jar:0.8.3.5]

at io.druid.cli.GuiceRunnable.initLifecycle(GuiceRunnable.java:71) [druid-standalone.jar:0.8.3.5]

at io.druid.cli.CliPeon.run(CliPeon.java:232) [druid-standalone.jar:0.8.3.5]

at io.druid.cli.Main.main(Main.java:99) [druid-standalone.jar:0.8.3.5]

2017-03-02T20:59:07,766 ERROR [main] io.druid.cli.CliPeon - Error when starting up. Failing.

java.net.BindException: Address already in use

at sun.nio.ch.Net.bind0(Native Method) ~[?:1.7.0_75]

at sun.nio.ch.Net.bind(Net.java:444) ~[?:1.7.0_75]

at sun.nio.ch.Net.bind(Net.java:436) ~[?:1.7.0_75]

at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) ~[?:1.7.0_75]

at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) ~[?:1.7.0_75]

at org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:321) ~[druid-standalone.jar:0.8.3.5]

at org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:80) ~[druid-standalone.jar:0.8.3.5]

at org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:236) ~[druid-standalone.jar:0.8.3.5]

at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68) ~[druid-standalone.jar:0.8.3.5]

at org.eclipse.jetty.server.Server.doStart(Server.java:366) ~[druid-standalone.jar:0.8.3.5]

at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68) ~[druid-standalone.jar:0.8.3.5]

at io.druid.server.initialization.jetty.JettyServerModule$1.start(JettyServerModule.java:167) ~[druid-standalone.jar:0.8.3.5]

at com.metamx.common.lifecycle.Lifecycle.start(Lifecycle.java:244) ~[druid-standalone.jar:0.8.3.5]

at io.druid.guice.LifecycleModule$2.start(LifecycleModule.java:155) ~[druid-standalone.jar:0.8.3.5]

at io.druid.cli.GuiceRunnable.initLifecycle(GuiceRunnable.java:71) [druid-standalone.jar:0.8.3.5]

at io.druid.cli.CliPeon.run(CliPeon.java:232) [druid-standalone.jar:0.8.3.5]

at io.druid.cli.Main.main(Main.java:99) [druid-standalone.jar:0.8.3.5]

``

Hi, michael, I have the same situation with you, have you solved the problem?

在 2017年3月3日星期五 UTC+8上午6:02:25,michael…@hulu.com写道:

I believe I have found the root cause, the version of druid we were using contained a bug within ForkingTaskRunner which was not freeing the internal map of used ports. This was causing the middleManagers to slowly use up all of the ports from druid.indexer.runner.startPort into the https://en.wikipedia.org/wiki/Ephemeral_port range. This is what was causing the seemingly random failures.

Patching the code on our deployed version will fix. It’s also worth noting that according to master, this is not in the lastest version of druid https://github.com/druid-io/druid/blob/master/indexing-service/src/main/java/io/druid/indexing/overlord/ForkingTaskRunner.java#L486

Hey Michael/Team,

We are facing the same issue with the latest Druid version 0.10.1- from time to time (1of 100-200 runs), the job is failing with the error:

2017-09-11T13:09:25,117 ERROR [main] io.druid.cli.CliPeon - Error when starting up.  Failing. 

java.net.BindException: Address already in use

Was the class ForkingTaskRunner patched with Michael’s changes?

Thanks,

Dan

Hi, michael, I have the same situation with you.Can you place your solution.
Thanks!

在 2017年3月15日星期三 UTC+8上午4:51:58,michael…@hulu.com写道: