Tranquility to Druid 0.8.3-RC3 Segment Handoff Errors

For reference I'm on tranquility version 0.6.4 and druid 0.8.3-rc3.

My middlemanager peons aren't correctly handing off segments. The segments are persisted and loaded by historical servers, but never dropped by the middlemanager. The peons then spin their wheels forever spitting out these messages every minute. These errors are popping up in the peon logs after persist-n-merge is finished:

2016-01-08T01:49:57,216 ERROR [coordinator_handoff_scheduled_0] io.druid.curator.discovery.ServerDiscoverySelector - No server instance found
2016-01-08T01:49:57,217 ERROR [coordinator_handoff_scheduled_0] io.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Exception while checking handoff for dataSource[events_agg_test] Segment[SegmentDescriptor{interval=2016-01-07T21:00:00.000Z/2016-01-07T22:00:00.000Z, version='2016-01-07T22:39:53.905Z', partitionNumber=0}], Will try again after [60000]secs
com.metamx.common.ISE: Cannot find instance of coordinator
	at io.druid.client.coordinator.CoordinatorClient.baseUrl(CoordinatorClient.java:108) ~[druid-server-0.8.3-rc3.jar:0.8.3-rc3]
	at io.druid.client.coordinator.CoordinatorClient.fetchServerView(CoordinatorClient.java:74) ~[druid-server-0.8.3-rc3.jar:0.8.3-rc3]
	at io.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier.checkForSegmentHandoffs(CoordinatorBasedSegmentHandoffNotifier.java:101) [druid-server-0.8.3-rc3.jar:0.8.3-rc3]
	at io.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier$1.run(CoordinatorBasedSegmentHandoffNotifier.java:86) [druid-server-0.8.3-rc3.jar:0.8.3-rc3]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_51]
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_51]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_51]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_51]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_51]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_51]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_51]

I also found some of these in the coordinator logs. Not sure at all if they are related but other than this coordinator logs look completely normal:

2016-01-08T19:47:47,884 INFO [Coordinator-Exec--0] io.druid.server.coordinator.LoadQueuePeon - Asking server peon[/druid/prod/loadQueue/druid-historical-01:4203] to load segment[events_test_2016-01-08T19:15:00.000Z_2016-01-08T19:30:00.000Z_2016-01-08T19:15:17.758Z]
2016-01-08T19:47:47,884 INFO [Coordinator-Exec--0] io.druid.server.coordinator.LoadQueuePeon - Server[/druid/prod/loadQueue/druid-historical-01:4203] skipping doNext() because something is currently loading[events_test_2016-01-08T19:15:00.000Z_2016-01-08T19:30:00.000Z_2016-01-08T19:15:17.758Z_3].


In addition in my coordinator console the datasources are showing up everywhere except for in the indexing panel, where I'm just seeing one datasource that is undefined. In the indexing console the datasources look fine in the payloads and logs.

Any help is much appreciated!

Michael

Hey Michael,

You are probably running into this: https://github.com/druid-io/druid/pull/2015#issuecomment-166394793

A reminder will be in the release notes for 0.8.3.

That’s sounds like exactly our issue. Thanks!

Michael

Hi Michael,

Did you resolved the issue?

Thanks,

Srini