Transient errors during creating new tasks

Hi,
we are using Tranquility 0.4.0 to push messages from Kafka to Druid. Every time our Tranquility app is trying to create a new task (or get existing tasks) we get a tons of Transient errors. Usually after a few seconds the errors will go away and everything works smoothly after then until new segment. Do you have any idea where can be a problem? Full error:

2015-04-16 09:12:29,430 [finagle/netty3-1] WARN com.metamx.tranquility.finagle.FutureRetry$ - Transient error, will try again in 26896 ms

java.io.IOException: Unable to push events to task: index_realtime_sdgadserver-videoimpression_2015-04-16T09:00:00.000Z_0_1 (status = TaskRunning)

at com.metamx.tranquility.druid.DruidBeam$$anonfun$4$$anonfun$apply$4$$anonfun$apply$6$$anonfun$apply$7$$anonfun$apply$3$$anonfun$applyOrElse$2.apply(DruidBeam.scala:160) ~[tranquility_2.10-0.4.0.jar:0.4.0]

at com.metamx.tranquility.druid.DruidBeam$$anonfun$4$$anonfun$apply$4$$anonfun$apply$6$$anonfun$apply$7$$anonfun$apply$3$$anonfun$applyOrElse$2.apply(DruidBeam.scala:146) ~[tranquility_2.10-0.4.0.jar:0.4.0]

at com.twitter.util.Future$$anonfun$map$1$$anonfun$apply$4.apply(Future.scala:828) ~[util-core_2.10-6.20.0.jar:6.20.0]

at com.twitter.util.Try$.apply(Try.scala:13) ~[util-core_2.10-6.20.0.jar:6.20.0]

at com.twitter.util.Future$.apply(Future.scala:84) ~[util-core_2.10-6.20.0.jar:6.20.0]

at com.twitter.util.Future$$anonfun$map$1.apply(Future.scala:828) ~[util-core_2.10-6.20.0.jar:6.20.0]

at com.twitter.util.Future$$anonfun$map$1.apply(Future.scala:828) ~[util-core_2.10-6.20.0.jar:6.20.0]

at com.twitter.util.Future$$anonfun$flatMap$1.apply(Future.scala:791) ~[util-core_2.10-6.20.0.jar:6.20.0]

at com.twitter.util.Future$$anonfun$flatMap$1.apply(Future.scala:790) ~[util-core_2.10-6.20.0.jar:6.20.0]

at com.twitter.util.Promise$Transformer.liftedTree1$1(Promise.scala:96) [util-core_2.10-6.20.0.jar:6.20.0]

at com.twitter.util.Promise$Transformer.k(Promise.scala:96) [util-core_2.10-6.20.0.jar:6.20.0]

at com.twitter.util.Promise$Transformer.apply(Promise.scala:106) [util-core_2.10-6.20.0.jar:6.20.0]

at com.twitter.util.Promise$Transformer.apply(Promise.scala:87) [util-core_2.10-6.20.0.jar:6.20.0]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_25]

at java.lang.Thread.run(Thread.java:745) [?:1.8.0_25]

Caused by: com.twitter.finagle.NoBrokersAvailableException: No hosts are available for disco!druid:firehose:sdgadserver-videoimpression-09-0000-0001

at com.twitter.finagle.NoStacktrace(Unknown Source) ~[?:?]

It sounds like for some reason, your tasks are taking a long time to start up. How long are things stuck with these transient errors? Are the new tasks running during that time? If it’s on the order of 5+ minutes, it might be that you don’t have indexing worker capacity to run both the old and new tasks at once, and the new tasks need to wait for the old ones to handoff and exit. You can solve that by having some more capacity. If it’s just a few minutes, I wonder if something in your indexing service setup just makes task startup take a while.

The druid.worker.capacity is OK, we have enough free workers. We have no tasks in pendings state. Usually I can see these transient error immediately after the tranq app reads the very first message for a new hour (= new segment). Nevertheless the tasks are created, I can see them immediately in the overlord console.

I extracted information from logs, in tranquility.log you can see how often do we see these transient errors. I greped logs from Overlord and MiddleManagers so you can see logs from the time we are getting these transient error. I can’t see any errors or anything suspicious. In middlemanager logs there is always “Affirmative. Running task” with no delay. (There is actually almost no traffic right now so the first message for that hour arrived at 12:12:38, do not be surprised).

overlord.log (6.18 KB)

tranquility.log (4.6 KB)

middleManager01.log (20.5 KB)

middleManager02.log (20.4 KB)

I’m also seeing this behaviour. When there is no running tasks, and I send a new event via tranquility (using tranquility in own scala app), then when it results in a new segment to be created, then I see these errors. Even if no other events/tasks are being handled by the system. Is there some configuration tweaks I need to make?

Hey guys, Gian is out for vacation until early June so there will be some delay until these questions are responded to.

I am also getting same sort of exception. Any help regarding this is appreciated.

876257 [finagle/netty3-1] WARN com.metamx.tranquility.finagle.FutureRetry$ - Transient error, will try again in 24635 ms

java.io.IOException: Unable to push events to task: index_realtime_topic_2015-06-02T17:42:00.000Z_0_0 (status = TaskRunning)

at com.metamx.tranquility.druid.DruidBeam$$anonfun$4$$anonfun$apply$4$$anonfun$apply$6$$anonfun$apply$7$$anonfun$apply$3$$anonfun$applyOrElse$2.apply(DruidBeam.scala:160) ~[tranquility_2.10-0.4.2.jar:0.4.2]

at com.metamx.tranquility.druid.DruidBeam$$anonfun$4$$anonfun$apply$4$$anonfun$apply$6$$anonfun$apply$7$$anonfun$apply$3$$anonfun$applyOrElse$2.apply(DruidBeam.scala:146) ~[tranquility_2.10-0.4.2.jar:0.4.2]

at com.twitter.util.Future$$anonfun$map$1$$anonfun$apply$6.apply(Future.scala:863) ~[util-core_2.10-6.23.0.jar:6.23.0]

at com.twitter.util.Try$.apply(Try.scala:13) ~[util-core_2.10-6.23.0.jar:6.23.0]

at com.twitter.util.Future$.apply(Future.scala:90) ~[util-core_2.10-6.23.0.jar:6.23.0]

at com.twitter.util.Future$$anonfun$map$1.apply(Future.scala:863) ~[util-core_2.10-6.23.0.jar:6.23.0]

at com.twitter.util.Future$$anonfun$map$1.apply(Future.scala:863) ~[util-core_2.10-6.23.0.jar:6.23.0]

at com.twitter.util.Future$$anonfun$flatMap$1.apply(Future.scala:824) ~[util-core_2.10-6.23.0.jar:6.23.0]

at com.twitter.util.Future$$anonfun$flatMap$1.apply(Future.scala:823) ~[util-core_2.10-6.23.0.jar:6.23.0]

at com.twitter.util.Promise$Transformer.liftedTree1$1(Promise.scala:100) [util-core_2.10-6.23.0.jar:6.23.0]

at com.twitter.util.Promise$Transformer.k(Promise.scala:100) [util-core_2.10-6.23.0.jar:6.23.0]

at com.twitter.util.Promise$Transformer.apply(Promise.scala:110) [util-core_2.10-6.23.0.jar:6.23.0]

at com.twitter.util.Promise$Transformer.apply(Promise.scala:91) [util-core_2.10-6.23.0.jar:6.23.0]

at com.twitter.util.Promise$$anon$2.run(Promise.scala:345) [util-core_2.10-6.23.0.jar:6.23.0]

at com.twitter.concurrent.LocalScheduler$Activation.run(Scheduler.scala:186) [util-core_2.10-6.23.0.jar:6.23.0]

at com.twitter.concurrent.LocalScheduler$Activation.submit(Scheduler.scala:157) [util-core_2.10-6.23.0.jar:6.23.0]

at com.twitter.concurrent.LocalScheduler.submit(Scheduler.scala:212) [util-core_2.10-6.23.0.jar:6.23.0]

at com.twitter.concurrent.Scheduler$.submit(Scheduler.scala:86) [util-core_2.10-6.23.0.jar:6.23.0]

at com.twitter.util.Promise.runq(Promise.scala:331) [util-core_2.10-6.23.0.jar:6.23.0]

at com.twitter.util.Promise.updateIfEmpty(Promise.scala:642) [util-core_2.10-6.23.0.jar:6.23.0]

at com.twitter.util.Promise.update(Promise.scala:615) [util-core_2.10-6.23.0.jar:6.23.0]

at com.twitter.util.Promise.setValue(Promise.scala:591) [util-core_2.10-6.23.0.jar:6.23.0]

at com.twitter.concurrent.AsyncQueue.offer(AsyncQueue.scala:76) [util-core_2.10-6.23.0.jar:6.23.0]

at com.twitter.finagle.transport.ChannelTransport.handleUpstream(ChannelTransport.scala:45) [finagle-core_2.10-6.24.0.jar:6.24.0]

at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [netty-3.10.1.Final.jar:na]

at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) [netty-3.10.1.Final.jar:na]

at org.jboss.netty.handler.codec.http.HttpContentDecoder.messageReceived(HttpContentDecoder.java:108) [netty-3.10.1.Final.jar:na]

at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) [netty-3.10.1.Final.jar:na]

at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [netty-3.10.1.Final.jar:na]

at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) [netty-3.10.1.Final.jar:na]

at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) [netty-3.10.1.Final.jar:na]

at org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:194) [netty-3.10.1.Final.jar:na]

at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) [netty-3.10.1.Final.jar:na]

at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [netty-3.10.1.Final.jar:na]

at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) [netty-3.10.1.Final.jar:na]

at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) [netty-3.10.1.Final.jar:na]

at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459) [netty-3.10.1.Final.jar:na]

at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536) [netty-3.10.1.Final.jar:na]

at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435) [netty-3.10.1.Final.jar:na]

at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) [netty-3.10.1.Final.jar:na]

at org.jboss.netty.handler.codec.http.HttpClientCodec.handleUpstream(HttpClientCodec.java:92) [netty-3.10.1.Final.jar:na]

at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [netty-3.10.1.Final.jar:na]

at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) [netty-3.10.1.Final.jar:na]

at org.jboss.netty.channel.SimpleChannelHandler.messageReceived(SimpleChannelHandler.java:142) [netty-3.10.1.Final.jar:na]

at com.twitter.finagle.channel.ChannelStatsHandler.messageReceived(ChannelStatsHandler.scala:86) [finagle-core_2.10-6.24.0.jar:6.24.0]

at org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:88) [netty-3.10.1.Final.jar:na]

at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [netty-3.10.1.Final.jar:na]

at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) [netty-3.10.1.Final.jar:na]

at org.jboss.netty.channel.SimpleChannelHandler.messageReceived(SimpleChannelHandler.java:142) [netty-3.10.1.Final.jar:na]

at com.twitter.finagle.channel.ChannelRequestStatsHandler.messageReceived(ChannelRequestStatsHandler.scala:35) [finagle-core_2.10-6.24.0.jar:6.24.0]

at org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:88) [netty-3.10.1.Final.jar:na]

at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [netty-3.10.1.Final.jar:na]

at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) [netty-3.10.1.Final.jar:na]

at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) [netty-3.10.1.Final.jar:na]

at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) [netty-3.10.1.Final.jar:na]

at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) [netty-3.10.1.Final.jar:na]

at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) [netty-3.10.1.Final.jar:na]

at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) [netty-3.10.1.Final.jar:na]

at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) [netty-3.10.1.Final.jar:na]

at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) [netty-3.10.1.Final.jar:na]

at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [netty-3.10.1.Final.jar:na]

at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [netty-3.10.1.Final.jar:na]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_65]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_65]

at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]

com.twitter.finagle.NoBrokersAvailableException: No hosts are available for druid:firehose:topic-42-0000-0000

at com.twitter.finagle.NoStacktrace(Unknown Source) ~[na:na]

I encountered the same exception as you did.

i am also facing exactly the same exception and that too only at the time when new segment has to be created.
Any help would be appreciated.

Out of curiosity, are folks using a warming period to start up the indexing task slightly before the actual segment start time? I’m running Tranquility 0.4.2 with Samza and I did have these transient errors at the beginning of the segment, but then I configured a warming period of 5 minutes for all of my real-time ingestion tasks and I stopped getting the error. One thing to note: adding a warming period will increase the time interval over which you have overlapping ingestion tasks as the new task starts up and the old tasks expire at the conclusion of their window period.

You can configure the warming period as part of the ClusteredBeamTuning build:

// From the Tranquility main page, Samza portion:

return DruidBeams
        .builder(timestamper)
        .curator(curator)
        .discoveryPath("/druid/discovery")
        .location(DruidLocation.create("overlord", "druid:firehose:%s", dataSource))
        .rollup(DruidRollup.create(dimensions, aggregators, QueryGranularity.MINUTE))
        .tuning(
            ClusteredBeamTuning.builder()
                               .segmentGranularity(Granularity.HOUR)
                               .windowPeriod(new Period("PT10M"))
                               .warmingPeriod(new Period("PT5M"))
                               .build()
        )
        .buildBeam();

``

–T

Thanks for the suggestion, but adding a ‘warming period’ doesn’t work for me.
Let me post my druid settings here:
overlord:
druid.service=overlord
druid.port=8087
common.runtime.properties:
druid.selectors.indexing.serviceName=overlord
druid.discovery.curator.path=``/druid/discovery

zookeeper:
[zk: localhost:2181(CONNECTED) 5] get /druid/discovery/overlord/b7e1a803-b2a8-4e9d-8f81-0f702f5733d7
{“name”:“overlord”,“id”:“b7e1a803-b2a8-4e9d-8f81-0f702f5733d7”,“address”:“phxdbx1189.phx.ebay.com”,“port”:8087,“sslPort”:null,“payload”:null,“registrationTimeUTC”:1432614864207,“serviceType”:“DYNAMIC”,“uriSpec”:null}

Full log of tranquility app:
2015-06-09 00:56:03,367 [169] INFO org.apache.curator.framework.imps.CuratorFrameworkImpl - Starting
2015-06-09 00:56:03,368 [169] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString:2181=druidppzk1.prodtracking.vip.lvs.ebay.com,druidppzk2.prodtracking.vip.lvs.ebay.com:2181,druidppzk3.prodtracking.vip.lvs.ebay.com:2181,druidppzk4.prodtracking.vip.slc.ebay.com:2181,druidppzk5.prodtracking.vip.slc.ebay.com:2181,druidppzk6.prodtracking.vip.phx.ebay.com:2181,druidppzk7.prodtracking.vip.phx.ebay.com:2181,druidppzk8.prodtracking.vip.phx.ebay.com:2181,druidppzk9.prodtracking.vip.slc.ebay.com:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@7c637cca
2015-06-09 00:56:03,376 [169] INFO com.metamx.emitter.core.LoggingEmitter - Start: started [true]
2015-06-09 00:56:03,377 [171] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server lvs3b02c-c310.stratus.lvs.ebay.com/10.132.112.134:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-09 00:56:03,385 [171] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to lvs3b02c-c310.stratus.lvs.ebay.com/10.132.112.134:2181, initiating session
2015-06-09 00:56:03,418 [171] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server lvs3b02c-c310.stratus.lvs.ebay.com/10.132.112.134:2181, sessionid = 0x34dc7c8f2110000, negotiated timeout = 60000
2015-06-09 00:56:03,418 [172] INFO org.apache.curator.framework.state.ConnectionStateManager - State change: CONNECTED
2015-06-09 00:56:03,419 [170] INFO com.metamx.common.scala.net.finagle.DiscoResolver - Updating instances for service[overlord] to Set()
2015-06-09 00:56:03,468 [169] INFO com.metamx.common.scala.net.finagle.DiscoResolver - Updating instances for service[overlord] to Set(ServiceInstance{name=‘overlord’, id=‘b7e1a803-b2a8-4e9d-8f81-0f702f5733d7’, address=‘phxdbx1189.phx.ebay.com’, port=8087, sslPort=null, payload=null, registrationTimeUTC=1432614864207, serviceType=DYNAMIC, uriSpec=null})
2015-06-09 00:56:03,470 [169] INFO com.metamx.tranquility.finagle.FinagleRegistry - Created client for service: overlord
2015-06-09 00:56:03,471 [169] INFO com.ebay.jetstream.tracking.sojenrich.druid.TranquilityProcessor - Tranquility processor enabled.
2015-06-09 00:56:03,615 [175] INFO com.metamx.tranquility.beam.ClusteredBeam - Creating new merged beam for identifier[overlord/pulsar_event_tq] timestamp[2015-06-09T01:00:00.000-07:00] (target = 2, actual = 0)
2015-06-09 00:56:03,619 [175] INFO com.metamx.common.scala.control$ - Creating druid indexing task with id: index_realtime_pulsar_event_tq_2015-06-09T01:00:00.000-07:00_0_0 (service = overlord)
2015-06-09 00:56:03,653 [92] INFO com.metamx.common.scala.control$ - Created druid indexing task with id: index_realtime_pulsar_event_tq_2015-06-09T01:00:00.000-07:00_0_0 (service = overlord)
2015-06-09 00:56:03,716 [175] INFO com.metamx.common.scala.net.finagle.DiscoResolver - Updating instances for service[druid:firehose:pulsar_event_tq-08-0000-0000] to Set()
2015-06-09 00:56:03,717 [175] INFO com.metamx.tranquility.finagle.FinagleRegistry - Created client for service: druid:firehose:pulsar_event_tq-08-0000-0000
2015-06-09 00:56:03,724 [175] INFO com.metamx.tranquility.beam.ClusteredBeam - Created beam: {“interval”:“2015-06-09T01:00:00.000-07:00/2015-06-09T02:00:00.000-07:00”,“partition”:0,“tasks”:[{“id”:“index_realtime_pulsar_event_tq_2015-06-09T01:00:00.000-07:00_0_0”,“firehoseId”:“pulsar_event_tq-08-0000-0000”}],“timestamp”:“2015-06-09T01:00:00.000-07:00”}
2015-06-09 00:56:03,724 [175] INFO com.metamx.tranquility.druid.DruidBeam - Closing Druid beam for datasource[pulsar_event_tq] interval[2015-06-09T01:00:00.000-07:00/2015-06-09T02:00:00.000-07:00] (tasks = index_realtime_pulsar_event_tq_2015-06-09T01:00:00.000-07:00_0_0)
2015-06-09 00:56:03,724 [175] INFO com.metamx.tranquility.finagle.FinagleRegistry - Closing client for service: druid:firehose:pulsar_event_tq-08-0000-0000
2015-06-09 00:56:03,725 [175] INFO com.metamx.common.scala.net.finagle.DiscoResolver - No longer monitoring service[druid:firehose:pulsar_event_tq-08-0000-0000]
2015-06-09 00:56:03,727 [175] INFO com.metamx.common.scala.control$ - Creating druid indexing task with id: index_realtime_pulsar_event_tq_2015-06-09T01:00:00.000-07:00_1_0 (service = overlord)
2015-06-09 00:56:03,759 [92] INFO com.metamx.common.scala.control$ - Created druid indexing task with id: index_realtime_pulsar_event_tq_2015-06-09T01:00:00.000-07:00_1_0 (service = overlord)
2015-06-09 00:56:03,822 [175] INFO com.metamx.common.scala.net.finagle.DiscoResolver - Updating instances for service[druid:firehose:pulsar_event_tq-08-0001-0000] to Set()
2015-06-09 00:56:03,823 [175] INFO com.metamx.tranquility.finagle.FinagleRegistry - Created client for service: druid:firehose:pulsar_event_tq-08-0001-0000
2015-06-09 00:56:03,824 [175] INFO com.metamx.tranquility.beam.ClusteredBeam - Created beam: {“interval”:“2015-06-09T01:00:00.000-07:00/2015-06-09T02:00:00.000-07:00”,“partition”:1,“tasks”:[{“id”:“index_realtime_pulsar_event_tq_2015-06-09T01:00:00.000-07:00_1_0”,“firehoseId”:“pulsar_event_tq-08-0001-0000”}],“timestamp”:“2015-06-09T01:00:00.000-07:00”}
2015-06-09 00:56:03,824 [175] INFO com.metamx.tranquility.druid.DruidBeam - Closing Druid beam for datasource[pulsar_event_tq] interval[2015-06-09T01:00:00.000-07:00/2015-06-09T02:00:00.000-07:00] (tasks = index_realtime_pulsar_event_tq_2015-06-09T01:00:00.000-07:00_1_0)
2015-06-09 00:56:03,824 [175] INFO com.metamx.tranquility.finagle.FinagleRegistry - Closing client for service: druid:firehose:pulsar_event_tq-08-0001-0000
2015-06-09 00:56:03,824 [175] INFO com.metamx.common.scala.net.finagle.DiscoResolver - No longer monitoring service[druid:firehose:pulsar_event_tq-08-0001-0000]
2015-06-09 00:56:03,831 [175] INFO com.metamx.tranquility.beam.ClusteredBeam - Writing new beam data to[/tranquility/beams/overlord/pulsar_event_tq/data]: {“latestTime”:“2015-06-09T01:00:00.000-07:00”,“latestCloseTime”:“2015-06-08T19:00:00.000-07:00”,“beams”:{“2015-06-09T01:00:00.000-07:00”:[{“interval”:“2015-06-09T01:00:00.000-07:00/2015-06-09T02:00:00.000-07:00”,“partition”:0,“tasks”:[{“id”:“index_realtime_pulsar_event_tq_2015-06-09T01:00:00.000-07:00_0_0”,“firehoseId”:“pulsar_event_tq-08-0000-0000”}],“timestamp”:“2015-06-09T01:00:00.000-07:00”},{“interval”:“2015-06-09T01:00:00.000-07:00/2015-06-09T02:00:00.000-07:00”,“partition”:1,“tasks”:[{“id”:“index_realtime_pulsar_event_tq_2015-06-09T01:00:00.000-07:00_1_0”,“firehoseId”:“pulsar_event_tq-08-0001-0000”}],“timestamp”:“2015-06-09T01:00:00.000-07:00”}]}}
2015-06-09 00:56:03,888 [175] INFO com.metamx.tranquility.beam.ClusteredBeam - Adding beams for identifier[overlord/pulsar_event_tq] timestamp[2015-06-09T01:00:00.000-07:00]: List(Map(interval -> 2015-06-09T01:00:00.000-07:00/2015-06-09T02:00:00.000-07:00, partition -> 0, tasks -> ArraySeq(Map(id -> index_realtime_pulsar_event_tq_2015-06-09T01:00:00.000-07:00_0_0, firehoseId -> pulsar_event_tq-08-0000-0000)), timestamp -> 2015-06-09T01:00:00.000-07:00), Map(interval -> 2015-06-09T01:00:00.000-07:00/2015-06-09T02:00:00.000-07:00, partition -> 1, tasks -> ArraySeq(Map(id -> index_realtime_pulsar_event_tq_2015-06-09T01:00:00.000-07:00_1_0, firehoseId -> pulsar_event_tq-08-0001-0000)), timestamp -> 2015-06-09T01:00:00.000-07:00))
2015-06-09 00:56:03,923 [175] INFO com.metamx.common.scala.net.finagle.DiscoResolver - Updating instances for service[druid:firehose:pulsar_event_tq-08-0000-0000] to Set()
2015-06-09 00:56:03,923 [175] INFO com.metamx.tranquility.finagle.FinagleRegistry - Created client for service: druid:firehose:pulsar_event_tq-08-0000-0000
2015-06-09 00:56:03,957 [175] INFO com.metamx.common.scala.net.finagle.DiscoResolver - Updating instances for service[druid:firehose:pulsar_event_tq-08-0001-0000] to Set()
2015-06-09 00:56:03,958 [175] INFO com.metamx.tranquility.finagle.FinagleRegistry - Created client for service: druid:firehose:pulsar_event_tq-08-0001-0000
2015-06-09 00:56:04,079 [175] INFO com.metamx.tranquility.beam.ClusteredBeam - Creating new merged beam for identifier[overlord/pulsar_event_tq] timestamp[2015-06-09T00:00:00.000-07:00] (target = 2, actual = 0)
2015-06-09 00:56:04,080 [175] INFO com.metamx.common.scala.control$ - Creating druid indexing task with id: index_realtime_pulsar_event_tq_2015-06-09T00:00:00.000-07:00_0_0 (service = overlord)
2015-06-09 00:56:04,113 [92] INFO com.metamx.common.scala.control$ - Created druid indexing task with id: index_realtime_pulsar_event_tq_2015-06-09T00:00:00.000-07:00_0_0 (service = overlord)
2015-06-09 00:56:04,148 [175] INFO com.metamx.common.scala.net.finagle.DiscoResolver - Updating instances for service[druid:firehose:pulsar_event_tq-07-0000-0000] to Set()
2015-06-09 00:56:04,149 [175] INFO com.metamx.tranquility.finagle.FinagleRegistry - Created client for service: druid:firehose:pulsar_event_tq-07-0000-0000
2015-06-09 00:56:04,149 [175] INFO com.metamx.tranquility.beam.ClusteredBeam - Created beam: {“interval”:“2015-06-09T00:00:00.000-07:00/2015-06-09T01:00:00.000-07:00”,“partition”:0,“tasks”:[{“id”:“index_realtime_pulsar_event_tq_2015-06-09T00:00:00.000-07:00_0_0”,“firehoseId”:“pulsar_event_tq-07-0000-0000”}],“timestamp”:“2015-06-09T00:00:00.000-07:00”}
2015-06-09 00:56:04,149 [175] INFO com.metamx.tranquility.druid.DruidBeam - Closing Druid beam for datasource[pulsar_event_tq] interval[2015-06-09T00:00:00.000-07:00/2015-06-09T01:00:00.000-07:00] (tasks = index_realtime_pulsar_event_tq_2015-06-09T00:00:00.000-07:00_0_0)
2015-06-09 00:56:04,150 [175] INFO com.metamx.tranquility.finagle.FinagleRegistry - Closing client for service: druid:firehose:pulsar_event_tq-07-0000-0000
2015-06-09 00:56:04,150 [175] INFO com.metamx.common.scala.net.finagle.DiscoResolver - No longer monitoring service[druid:firehose:pulsar_event_tq-07-0000-0000]
2015-06-09 00:56:04,159 [175] INFO com.metamx.common.scala.control$ - Creating druid indexing task with id: index_realtime_pulsar_event_tq_2015-06-09T00:00:00.000-07:00_1_0 (service = overlord)
2015-06-09 00:56:04,191 [92] INFO com.metamx.common.scala.control$ - Created druid indexing task with id: index_realtime_pulsar_event_tq_2015-06-09T00:00:00.000-07:00_1_0 (service = overlord)
2015-06-09 00:56:04,226 [175] INFO com.metamx.common.scala.net.finagle.DiscoResolver - Updating instances for service[druid:firehose:pulsar_event_tq-07-0001-0000] to Set()
2015-06-09 00:56:04,227 [175] INFO com.metamx.tranquility.finagle.FinagleRegistry - Created client for service: druid:firehose:pulsar_event_tq-07-0001-0000
2015-06-09 00:56:04,227 [175] INFO com.metamx.tranquility.beam.ClusteredBeam - Created beam: {“interval”:“2015-06-09T00:00:00.000-07:00/2015-06-09T01:00:00.000-07:00”,“partition”:1,“tasks”:[{“id”:“index_realtime_pulsar_event_tq_2015-06-09T00:00:00.000-07:00_1_0”,“firehoseId”:“pulsar_event_tq-07-0001-0000”}],“timestamp”:“2015-06-09T00:00:00.000-07:00”}
2015-06-09 00:56:04,228 [175] INFO com.metamx.tranquility.druid.DruidBeam - Closing Druid beam for datasource[pulsar_event_tq] interval[2015-06-09T00:00:00.000-07:00/2015-06-09T01:00:00.000-07:00] (tasks = index_realtime_pulsar_event_tq_2015-06-09T00:00:00.000-07:00_1_0)
2015-06-09 00:56:04,228 [175] INFO com.metamx.tranquility.finagle.FinagleRegistry - Closing client for service: druid:firehose:pulsar_event_tq-07-0001-0000
2015-06-09 00:56:04,228 [175] INFO com.metamx.common.scala.net.finagle.DiscoResolver - No longer monitoring service[druid:firehose:pulsar_event_tq-07-0001-0000]
2015-06-09 00:56:04,230 [175] INFO com.metamx.tranquility.beam.ClusteredBeam - Writing new beam data to[/tranquility/beams/overlord/pulsar_event_tq/data]: {“latestTime”:“2015-06-09T01:00:00.000-07:00”,“latestCloseTime”:“2015-06-08T19:00:00.000-07:00”,“beams”:{“2015-06-09T01:00:00.000-07:00”:[{“interval”:“2015-06-09T01:00:00.000-07:00/2015-06-09T02:00:00.000-07:00”,“partition”:0,“tasks”:[{“id”:“index_realtime_pulsar_event_tq_2015-06-09T01:00:00.000-07:00_0_0”,“firehoseId”:“pulsar_event_tq-08-0000-0000”}],“timestamp”:“2015-06-09T01:00:00.000-07:00”},{“interval”:“2015-06-09T01:00:00.000-07:00/2015-06-09T02:00:00.000-07:00”,“partition”:1,“tasks”:[{“id”:“index_realtime_pulsar_event_tq_2015-06-09T01:00:00.000-07:00_1_0”,“firehoseId”:“pulsar_event_tq-08-0001-0000”}],“timestamp”:“2015-06-09T01:00:00.000-07:00”}],“2015-06-09T00:00:00.000-07:00”:[{“interval”:“2015-06-09T00:00:00.000-07:00/2015-06-09T01:00:00.000-07:00”,“partition”:0,“tasks”:[{“id”:“index_realtime_pulsar_event_tq_2015-06-09T00:00:00.000-07:00_0_0”,“firehoseId”:“pulsar_event_tq-07-0000-0000”}],“timestamp”:“2015-06-09T00:00:00.000-07:00”},{“interval”:“2015-06-09T00:00:00.000-07:00/2015-06-09T01:00:00.000-07:00”,“partition”:1,“tasks”:[{“id”:“index_realtime_pulsar_event_tq_2015-06-09T00:00:00.000-07:00_1_0”,“firehoseId”:“pulsar_event_tq-07-0001-0000”}],“timestamp”:“2015-06-09T00:00:00.000-07:00”}]}}
2015-06-09 00:56:04,287 [175] INFO com.metamx.tranquility.beam.ClusteredBeam - Adding beams for identifier[overlord/pulsar_event_tq] timestamp[2015-06-09T00:00:00.000-07:00]: List(Map(interval -> 2015-06-09T00:00:00.000-07:00/2015-06-09T01:00:00.000-07:00, partition -> 0, tasks -> ArraySeq(Map(id -> index_realtime_pulsar_event_tq_2015-06-09T00:00:00.000-07:00_0_0, firehoseId -> pulsar_event_tq-07-0000-0000)), timestamp -> 2015-06-09T00:00:00.000-07:00), Map(interval -> 2015-06-09T00:00:00.000-07:00/2015-06-09T01:00:00.000-07:00, partition -> 1, tasks -> ArraySeq(Map(id -> index_realtime_pulsar_event_tq_2015-06-09T00:00:00.000-07:00_1_0, firehoseId -> pulsar_event_tq-07-0001-0000)), timestamp -> 2015-06-09T00:00:00.000-07:00))
2015-06-09 00:56:04,322 [175] INFO com.metamx.common.scala.net.finagle.DiscoResolver - Updating instances for service[druid:firehose:pulsar_event_tq-07-0000-0000] to Set()
2015-06-09 00:56:04,323 [175] INFO com.metamx.tranquility.finagle.FinagleRegistry - Created client for service: druid:firehose:pulsar_event_tq-07-0000-0000
2015-06-09 00:56:04,358 [175] INFO com.metamx.common.scala.net.finagle.DiscoResolver - Updating instances for service[druid:firehose:pulsar_event_tq-07-0001-0000] to Set()
2015-06-09 00:56:04,359 [175] INFO com.metamx.tranquility.finagle.FinagleRegistry - Created client for service: druid:firehose:pulsar_event_tq-07-0001-0000
2015-06-09 00:56:04,406 [92] WARN com.metamx.tranquility.finagle.FutureRetry$ - Transient error, will try again in 206 ms
java.io.IOException: Unable to push events to task: index_realtime_pulsar_event_tq_2015-06-09T00:00:00.000-07:00_1_0 (status = TaskRunning)
at com.metamx.tranquility.druid.DruidBeam$$anonfun$4$$anonfun$apply$4$$anonfun$apply$6$$anonfun$apply$7$$anonfun$apply$3$$anonfun$applyOrElse$2.apply(DruidBeam.scala:160)
at com.metamx.tranquility.druid.DruidBeam$$anonfun$4$$anonfun$apply$4$$anonfun$apply$6$$anonfun$apply$7$$anonfun$apply$3$$anonfun$applyOrElse$2.apply(DruidBeam.scala:146)
at com.twitter.util.Future$$anonfun$map$1$$anonfun$apply$6.apply(Future.scala:863)
at com.twitter.util.Try$.apply(Try.scala:13)
at com.twitter.util.Future$.apply(Future.scala:90)
at com.twitter.util.Future$$anonfun$map$1.apply(Future.scala:863)
at com.twitter.util.Future$$anonfun$map$1.apply(Future.scala:863)
at com.twitter.util.Future$$anonfun$flatMap$1.apply(Future.scala:824)
at com.twitter.util.Future$$anonfun$flatMap$1.apply(Future.scala:823)
at com.twitter.util.Promise$Transformer.liftedTree1$1(Promise.scala:100)
at com.twitter.util.Promise$Transformer.k(Promise.scala:100)
at com.twitter.util.Promise$Transformer.apply(Promise.scala:110)
at com.twitter.util.Promise$Transformer.apply(Promise.scala:91)
at com.twitter.util.Promise$$anon$2.run(Promise.scala:345)
at com.twitter.concurrent.LocalScheduler$Activation.run(Scheduler.scala:186)
at com.twitter.concurrent.LocalScheduler$Activation.submit(Scheduler.scala:157)
at com.twitter.concurrent.LocalScheduler.submit(Scheduler.scala:212)
at com.twitter.concurrent.Scheduler$.submit(Scheduler.scala:86)
at com.twitter.util.Promise.runq(Promise.scala:331)
at com.twitter.util.Promise.updateIfEmpty(Promise.scala:642)
at com.twitter.util.Promise.update(Promise.scala:615)
at com.twitter.util.Promise.setValue(Promise.scala:591)
at com.twitter.concurrent.AsyncQueue.offer(AsyncQueue.scala:76)
at com.twitter.finagle.transport.ChannelTransport.handleUpstream(ChannelTransport.scala:45)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.jboss.netty.handler.codec.http.HttpContentDecoder.messageReceived(HttpContentDecoder.java:108)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:194)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.handler.codec.http.HttpClientCodec.handleUpstream(HttpClientCodec.java:92)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.jboss.netty.channel.SimpleChannelHandler.messageReceived(SimpleChannelHandler.java:142)
at com.twitter.finagle.channel.ChannelStatsHandler.messageReceived(ChannelStatsHandler.scala:86)
at org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:88)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.jboss.netty.channel.SimpleChannelHandler.messageReceived(SimpleChannelHandler.java:142)
at com.twitter.finagle.channel.ChannelRequestStatsHandler.messageReceived(ChannelRequestStatsHandler.scala:35)
at org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:88)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: com.twitter.finagle.NoBrokersAvailableException: No hosts are available for druid:firehose:pulsar_event_tq-07-0001-0000
at com.twitter.finagle.NoStacktrace(Unknown Source)

thanks.

Hi all,

I too was observing the same error but using warmingPeriod of 10 minutes resolved this issue. Use DruidBeams.builder for the same as referred to by TJ Giuli above

Thanks,

Saurabh

Thanks you, I will also try out the warming period, missed that on previous attempt.

UPDATE:

I confirmed that the NoBrokersAvailableException is caused by middle manager/peon not ready to run the task, after the task started, the errors were gone. Now I can use tranquility to ingest data, but setting ‘warming period’ didn’t help to avoid the initial errors.

thanks!

One common reason that transient errors occur is if there is a misconfiguration in service discovery of tranquility looking for the overlord. Make sure the name of your overlord and the service discovery config for tranquility is the same.

I thought it worth mentioning here, that I am pretty sure I am having a similar issue: https://groups.google.com/forum/#!topic/druid-user/UT5JNSZqAuk

I was able to reproduce this with my test environment as outlined here: https://github.com/mark1900/druid-sandbox/tree/master/kafka-storm-tranquility-druid-topology-test

My technology stack:

  • CentOs 7

  • PostgreSQL 9.2.x

  • ZooKeeper 3.4.6

  • Kafka 2.10-0.8.2.1

  • Apache Storm 0.9.4

  • Druid 0.7.3
    My configuration changes:

  • zookeeper-3.4.6/conf/zoo.cfg (Direct copy from zookeeper-3.4.6/conf/zoo_sample.cfg)

  • apache-storm-0.9.4/conf/storm.yaml

  • druid-0.7.3/config/_common/common.runtime.properties

  • ~/.curlrc

  • /var/lib/pgsql/data/pg_hba.conf

  • /var/lib/pgsql/data/postgresql.conf

Hi, Mark. Have you fixed this issue? I had the same issue, and adding “warmingPeroid” didn’t help to fix this issue.

在 2015年6月17日星期三 UTC+8上午12:40:34,Mark写道:

I currently have not been able to resolve this issue. Since my configuration uses out-of-the-box settings where possible and is all on one node/host, I am surprised. I guess the positive news is that I always see these exceptions in my logs and it is very easy to reproduce.

I created a GitHub Druid Issue: https://github.com/druid-io/druid/issues/1448

yes, the worker capacity was the root cause in my situation.

在 2015年4月16日星期四 UTC+8下午11:55:37,Gian Merlino写道:

I am still facing this issue,and on multiple occasion messages get lost because of the same error i.e
" com.twitter.finagle.NoBrokersAvailableException: No hosts are available "

In my case i have configured worker capacity to be 15 and at any time there are only 4 tasks running.

Is it possible that although i have configured 15 workers but if memory is not available on the remote box they may not get launched??