Tasks are submitted successfully using tranquility but not getting completed

Hi team,

2018-11-14 10:41:15,874 [ClusteredBeam-ZkFuturePool-537e14f5-b0ac-4710-9736-a13aef064f57] INFO c.m.t.finagle.FinagleRegistry - Created client for service: disco!firehose:overlord:wikipedia-minute-007-0000-0000

2018-11-14 10:42:32,125 [Hashed wheel timer #1] WARN c.m.tranquility.beam.ClusteredBeam - Emitting alert: [anomaly] Failed to propagate events: overlord/wikipedia-minute

{

“eventCount” : 1,

“timestamp” : “2018-10-14T07:00:00.000Z”,

“beams” : “MergingPartitioningBeam(DruidBeam(interval = 2018-10-14T07:00:00.000Z/2018-10-14T08:00:00.000Z, partition = 0, tasks = [index_realtime_wikipedia-minute_2018-10-14T07:00:00.000Z_0_0/wikipedia-minute-007-0000-0000]))”

}

com.twitter.finagle.NoBrokersAvailableException: No hosts are available for disco!firehose:overlord:wikipedia-minute-007-0000-0000, Dtab.base=, Dtab.local=

at com.twitter.finagle.NoStacktrace(Unknown Source) ~[na:na]

2018-11-14 10:42:32,558 [Hashed wheel timer #1] INFO c.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“alerts”,“timestamp”:“2018-11-14T10:42:32.522Z”,“service”:“tranquility”,“host”:“localhost”,“severity”:“anomaly”,“description”:“Failed to propagate events: overlord/wikipedia-minute”,“data”:{“exceptionType”:“com.twitter.finagle.NoBrokersAvailableException”,“exceptionStackTrace”:“com.twitter.finagle.NoBrokersAvailableException: No hosts are available for disco!firehose:overlord:wikipedia-minute-007-0000-0000, Dtab.base=, Dtab.local=\n\tat com.twitter.finagle.NoStacktrace(Unknown Source)\n”,“timestamp”:“2018-10-14T07:00:00.000Z”,“beams”:“MergingPartitioningBeam(DruidBeam(interval = 2018-10-14T07:00:00.000Z/2018-10-14T08:00:00.000Z, partition = 0, tasks = [index_realtime_wikipedia-minute_2018-10-14T07:00:00.000Z_0_0/wikipedia-minute-007-0000-0000]))”,“eventCount”:1,“exceptionMessage”:“No hosts are available for disco!firehose:overlord:wikipedia-minute-007-0000-0000, Dtab.base=, Dtab.local=”}}]

2018-11-14 10:43:02,625 [Curator-ServiceCache-0] INFO c.m.c.s.net.finagle.DiscoResolver - Updating instances for service[firehose:overlord:wikipedia-minute-007-0000-0000] to Set(ServiceInstance{name=‘firehose:overlord:wikipedia-minute-007-0000-0000’, id=‘4358c074-701c-40bb-9b8b-a55b7fce4a07’, address=‘127.0.0.1’, port=8100, sslPort=null, payload=null, registrationTimeUTC=1542192182493, serviceType=DYNAMIC, uriSpec=null})

P3650D^C2018-11-14 11:04:17,060 [Thread-11] INFO c.metamx.common.lifecycle.Lifecycle - Running shutdown hook

2018-11-14 11:04:17,316 [Thread-11] INFO o.e.jetty.server.ServerConnector - Stopped ServerConnector@241fbec{HTTP/1.1}{0.0.0.0:8200}

2018-11-14 11:04:17,469 [Thread-11] INFO c.m.c.l.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void com.metamx.tranquility.tranquilizer.Tranquilizer.stop()] on object[Tranquilizer(com.metamx.tranquility.beam.TransformingBeam@11787b64)].

2018-11-14 11:04:17,480 [Thread-11] INFO c.metamx.tranquility.druid.DruidBeam - Closing Druid beam for datasource[wikipedia-minute] interval[2018-10-14T07:00:00.000Z/2018-10-14T08:00:00.000Z] (tasks = index_realtime_wikipedia-minute_2018-10-14T07:00:00.000Z_0_0)

2018-11-14 11:04:17,516 [Thread-11] INFO c.m.t.finagle.FinagleRegistry - Closing client for service: disco!firehose:overlord:wikipedia-minute-007-0000-0000

2018-11-14 11:04:17,637 [Thread-11] INFO c.m.c.s.net.finagle.DiscoResolver - No longer monitoring service[firehose:overlord:wikipedia-minute-007-0000-0000]

2018-11-14 11:04:17,660 [Thread-11] INFO c.m.t.finagle.FinagleRegistry - Closing client for service: disco!overlord

2018-11-14 11:04:17,664 [Thread-11] INFO c.m.c.s.net.finagle.DiscoResolver - No longer monitoring service[overlord]

2018-11-14 11:04:17,899 [Thread-11] INFO c.m.c.l.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void com.metamx.common.scala.net.curator.Disco.stop()] on object[com.metamx.common.scala.net.curator.Disco@6d963d70].

2018-11-14 11:04:17,900 [Curator-Framework-0] INFO o.a.c.f.imps.CuratorFrameworkImpl - backgroundOperationsLoop exiting

2018-11-14 11:04:17,933 [Thread-11] INFO org.apache.zookeeper.ZooKeeper - Session: 0x16711b05bbb0013 closed

Zookeeper Log :2018-11-14 10:38:43,239 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9182:NIOServerCnxn@1007] - Closed socket connection for client /0:0:0:0:0:0:0:1:46970 which had sessionid 0x16711b05bbb0012

2018-11-14 10:39:04,000 [myid:] - INFO [SessionTracker:ZooKeeperServer@347] - Expiring session 0x16711b05bbb0012, timeout of 20000ms exceeded

2018-11-14 10:39:04,016 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x16711b05bbb0012

2018-11-14 10:40:36,807 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9182:NIOServerCnxnFactory@197] - Accepted socket connection from /0:0:0:0:0:0:0:1:46986

2018-11-14 10:40:36,898 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9182:ZooKeeperServer@868] - Client attempting to establish new session at /0:0:0:0:0:0:0:1:46986

2018-11-14 10:40:36,921 [myid:] - INFO [SyncThread:0:ZooKeeperServer@617] - Established session 0x16711b05bbb0013 with negotiated timeout 20000 for client /0:0:0:0:0:0:0:1:46986

2018-11-14 10:42:52,414 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9182:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:38654

2018-11-14 10:42:52,527 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9182:ZooKeeperServer@868] - Client attempting to establish new session at /127.0.0.1:38654

2018-11-14 10:42:52,531 [myid:] - INFO [SyncThread:0:ZooKeeperServer@617] - Established session 0x16711b05bbb0014 with negotiated timeout 30000 for client /127.0.0.1:38654

2018-11-14 11:04:17,918 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x16711b05bbb0013

2018-11-14 11:04:17,926 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9182:NIOServerCnxn@1007] - Closed socket connection for client /0:0:0:0:0:0:0:1:46986 which had sessionid 0x16711b05bbb0013

2018-11-14 11:07:09,014 [myid:] - INFO [PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started.

2018-11-14 11:07:09,017 [myid:] - INFO [PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed.

Thanks

Kranthi Kumar

middlemanager.log (27.1 KB)

overlord.log (39.9 KB)

wikipedia-server.json (2.14 KB)

wikiticker-2015-09-12-sampled.json (394 Bytes)

Does your overlord’s service name – one defined in runtime.properties under druid.service configuration key – really just “overlord” ? and not “druid/overlord” ?

Hi Dan,

The value of druid.service under overlord’s runtime.properties is overlord only. Pasted the Overlord ,Middlemanager and common runtime properties config for reference

Overlord config :

Default host: localhost. Default port: 8090. If you run each node type on its own node in production, you should override these values to be IP:8080

druid.host=127.0.0.1

druid.port=8090

druid.service=overlord

Run the overlord in local mode with a single peon to execute tasks

This is not recommended for production.

druid.indexer.queue.startDelay=PT0M

This setting is too small for real production workloads

druid.indexer.runner.javaOpts=-server -Xmx12g -Xms12g -XX:+UseG1GC -XX:MaxGCPauseMillis=500 -XX:InitiatingHeapOccupancyPercent=35 -Djava.awt.headless=true -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

These settings are also too small for real production workloads

Please see our recommended production settings in the docs (http://druid.io/docs/latest/Production-Cluster-Configuration.html)

druid.indexer.fork.property.druid.processing.numThreads=10

druid.indexer.fork.property.druid.computation.buffer.size=100000000

druid.emitter.logging.logLevel=info

druid.indexer.task.restoreTasksOnRestart=true

Uncomment following property if you are running Middle Manager

#druid.indexer.runner.type=remote

MiddleManager runtime.properties :

druid.service=middleManager

druid.port=8091

Number of tasks per middleManager

druid.worker.capacity=100

Task launch parameters

druid.indexer.runner.javaOpts=-server -Xmx12g -Xms12g -XX:+UseG1GC -XX:MaxGCPauseMillis=500 -XX:InitiatingHeapOccupancyPercent=35 -Djava.awt.headless=true -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

#druid.indexer.task.baseTaskDir=/mnt/mfs/venator/druid/task

druid.indexer.task.baseTaskDir=/app/druid/task

druid.indexer.fork.property.druid.processing.numThreads=10

druid.indexer.fork.property.druid.computation.buffer.size=100000000

druid.indexer.task.restoreTasksOnRestart=true

HTTP server threads

druid.server.http.numThreads=25

Processing threads and buffers

druid.processing.buffer.sizeBytes=536870912

druid.processing.numThreads=4

druid.emitter.logging.logLevel=info

common.runtime.properties :

Also, for production to use mysql add, “io.druid.extensions:mysql-metadata-storage”

druid.extensions.coordinates=[“io.druid.extensions:druid-examples”,“io.druid.extensions:druid-kafka-eight”, “io.druid.extensions:mysql-metadata-storage”]

druid.extensions.localRepository=extensions-repo

Zookeeper

#druid.zk.service.host:9182=dfw-kafka-broker03-catdev.prod.walmart.com,dfw-kafka-broker04-catdev.prod.walmart.com:9182,dfw-kafka-broker05-catdev.prod.walmart.com:9182

druid.zk.service.host=127.0.0.1:9182

druid.discovery.curator.path=/test/discovery

Metadata Storage (use something like mysql in production by uncommenting properties below)

by default druid will use derby

druid.metadata.storage.type=mysql

druid.metadata.storage.connector.connectURI=jdbc:mysql://localhost:3306/druid

druid.metadata.storage.connector.user=druid

druid.metadata.storage.connector.password=diurd

#druid.metadata.storage.type=mysql

#druid.metadata.storage.connector.connectURI=jdbc:mysql://localhost:3306/druid?characterEncoding=UTF-8

#druid.metadata.storage.connector.user=root

#druid.metadata.storage.connector.password=Admin@123

Deep storage (local filesystem for examples - don’t use this in production)

druid.storage.type=local

druid.storage.storageDirectory=/tmp/druid/localStorage

Query Cache (we use a simple 10mb heap-based local cache on the broker)

druid.cache.type=local

druid.cache.sizeInBytes=536870912

Indexing service discovery

druid.selectors.indexing.serviceName=overlord

Coordinator Service Discovery

druid.selectors.coordinator.serviceName=coordinator

Monitoring (disabled for examples, if you enable SysMonitor, make sure to include sigar jar in your cp)

druid.monitoring.monitors=[“com.metamx.metrics.SysMonitor”,“com.metamx.metrics.JvmMonitor”]

Metrics logging (disabled for examples - change this to logging or http in production)

druid.emitter=noop

Thanks

Kranthi Kumar

Can you check before you submit the task, if there are services under the znode defined here?

druid.discovery.curator.path=/test/discovery

Hi Dan,

Are you asking druid.zk.paths.base been set ? I didn’t set druid.zk.paths.base config value.

If it is empty, meaning the overlord and middleManager nodes are not running yet. You can also confirm by going to(or curl’ing) http://:8090/status

Response from  http://<overlord host>:8090/status :

 {"version":"0.8.3","modules":[{"name":"io.druid.metadata.storage.mysql.MySQLMetadataStorageModule","artifact":"mysql-metadata-storage","version":"0.8.3"},{"name":"io.druid.examples.ExamplesDruidModule","artifact":"druid-examples","version":"0.8.3"},{"name":"io.druid.firehose.kafka.KafkaEightDruidModule","artifact":"druid-kafka-eight","version":"0.8.3"}],"memory":{"maxMemory":1073741824,"totalMemory":1073741824,"freeMemory":854087664,"usedMemory":219654160}}

Regards
Kranthi Kumar