Overlord stopped creating new tasks

Hi Druids,
I am using indexing service to write data from gcloud pubsub to druid.

The code is working without errors, I see that the Tranquilizer is sending data as expected.

Tasks are not being created.

I have 1 broker, 2 collocated historical/middleManager nodes and 1 collocated overlord/coordinator node.

I do receive the following error:

tranquilizer:

o.a.c.f.imps.CuratorFrameworkImpl : Background exception was not retry-able or retry gave up

com.metamx.emitter.core.LoggingEmitter : Event [{“feed”:“alerts”,“timestamp”:“2016-09-25T10:15:42.712Z”,“service”:“tranquility”,“host”:“localhost”,“severity”:“anomaly”,“description”:“Failed to propagate events: druid:overlord/RTBMetrics”,“data”:{“exceptionType”:“com.twitter.finagle.NoBrokersAvailableException”,“exceptionStackTrace”:“com.twitter.finagle.NoBrokersAvailableException: No hosts are available for disco!firehose:druid:overlord:RTBMetrics-010-0001-0000, Dtab.base=, Dtab.local=\n\tat com.twitter.finagle.NoStacktrace(Unknown Source)\n”,“timestamp”:“2016-09-25T10:00:00.000Z”,“beams”:“MergingPartitioningBeam(DruidBeam(interval = 2016-09-25T10:00:00.000Z/2016-09-25T11:00:00.000Z, partition = 0, tasks = [index_realtime_RTBMetrics_2016-09-25T10:00:00.000Z_0_0/RTBMetrics-010-0000-0000; index_realtime_RTBMetrics_2016-09-25T10:00:00.000Z_0_1/RTBMetrics-010-0000-0001]), DruidBeam(interval = 2016-09-25T10:00:00.000Z/2016-09-25T11:00:00.000Z, partition = 1, tasks = [index_realtime_RTBMetrics_2016-09-25T10:00:00.000Z_1_0/RTBMetrics-010-0001-0000; index_realtime_RTBMetrics_2016-09-25T10:00:00.000Z_1_1/RTBMetrics-010-0001-0001]))”,“eventCount”:1,“exceptionMessage”:“No hosts are available for disco!firehose:druid:overlord:RTBMetrics-010-0001-0000, Dtab.base=, Dtab.local=”}}]

overlord:

io.druid.indexing.overlord.TaskMaster - TaskMaster set a new Lifecycle without the old one being cleared! Race condition: {class=io.druid.indexing.overlord.TaskMaster}

Is it all related to Zookeeper?

Regards,

Rotem

I have increased the ZK size (CPU+MEM).
Tranquilizer sending data, however no new data is ingested by Druid.

broker.runtime.properties (418 Bytes)

common.runtime.properties (1.64 KB)

coordinator.runtime.properties (115 Bytes)

historical1.runtime.properties (595 Bytes)

historical1.runtime.properties (595 Bytes)

middleManager1.runtime.properties (1.15 KB)

middleManager1.runtime.properties (1.15 KB)

overlord.runtime.properties (776 Bytes)

Likely the data being sent is out of the windowPeriod.

If you are streaming data from Kafka, you can use the new exactly once Kafka indexing task which should solve a lot of these problems:

https://imply.io/docs/latest/tutorial-kafka-indexing-service.html