exception in realtime node

realtime nodes are unable to handoff their segments. Using druid 0.8.3.

I see the following exception in the logs:

7:10:25.805 PM

2016-04-09 19:10:25,805 ERROR i.d.s.r.p.CoordinatorBasedSegmentHandoffNotifier [coordinator_handoff_scheduled_0] Exception while checking handoff for dataSource[dripstat] Segment[SegmentDescriptor{interval=2016-04-09T17:00:00.000Z/2016-04-09T18:00:00.000Z, version=‘2016-04-09T17:00:00.000Z’, partitionNumber=0}], Will try again after [60000]secs

java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.jboss.netty.handler.timeout.ReadTimeoutException

at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]

at io.druid.client.coordinator.CoordinatorClient.fetchServerView(CoordinatorClient.java:98) ~[druid-server-0.8.3.jar:0.8.3]

at io.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier.checkForSegmentHandoffs(CoordinatorBasedSegmentHandoffNotifier.java:101) [druid-server-0.8.3.jar:0.8.3]

at io.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier$1.run(CoordinatorBasedSegmentHandoffNotifier.java:86) [druid-server-0.8.3.jar:0.8.3]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_77]

at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_77]

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_77]

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_77]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_77]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_77]

at java.lang.Thread.run(Thread.java:745) [?:1.8.0_77]

Caused by: java.util.concurrent.ExecutionException: org.jboss.netty.handler.timeout.ReadTimeoutException

at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) ~[guava-16.0.1.jar:?]

at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) ~[guava-16.0.1.jar:?]

at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[guava-16.0.1.jar:?]

at io.druid.client.coordinator.CoordinatorClient.fetchServerView(CoordinatorClient.java:82) ~[druid-server-0.8.3.jar:0.8.3]

… 9 more

Caused by: org.jboss.netty.handler.timeout.ReadTimeoutException

at org.jboss.netty.handler.timeout.ReadTimeoutHandler.(ReadTimeoutHandler.java:84) ~[netty-3.10.4.Final.jar:?]

at com.metamx.http.client.NettyHttpClient.go(NettyHttpClient.java:176) ~[http-client-1.0.4.jar:?]

at com.metamx.http.client.AbstractHttpClient.go(AbstractHttpClient.java:14) ~[http-client-1.0.4.jar:?]

at io.druid.client.coordinator.CoordinatorClient.fetchServerView(CoordinatorClient.java:68) ~[druid-server-0.8.3.jar:0.8.3]

… 9 more

Looking at the coordinator console however, the coordinator is able to see the realtime node. even all the queries are working fine.

How to resolve this?

I fixed the network issue but i still see the following msgs now in realtime nodes:

2016-04-09 20:22:25,367 INFO i.d.s.r.p.CoordinatorBasedSegmentHandoffNotifier [coordinator_handoff_scheduled_0] Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2016-04-09T19:00:00.000Z/2016-04-09T20:00:00.000Z, version=‘2016-04-09T19:00:00.000Z’, partitionNumber=0}]]

Its full of ‘still waiting for handoff for segments’ msgs.

Hi Prashant,
I guess due to the network storage the segment is not being pushed to deep storage at all.

Do you see a segment metadata entry in DB for above segment ?

If no, then check for any exception in the task logs or overlord logs related to segment publishing.

If the metadata entry is present in the db, make sure you have enough free space available on the historical nodes to load the segments and there are no exceptions in coordinator/historical while loading the segment.