Realtime node is not getting events after restart

Hi,

I restarted my realtime node which worked fine so far, but since the restart, it doesn’t seem to register any events.

During the Realtime startup I got the following in the logs and wonder if this is the reason:

2015-09-04T14:09:01,248 ERROR [chief-bid_stats[0]] io.druid.segment.realtime.RealtimeManager - RuntimeException aborted realtime processing[bid_stats]: {class=io.druid.segment.realtime.RealtimeManager, exceptionType=class com.metamx.common.ISE, exceptionMessage=hydrant[FireHydrant{index=null, queryable=io.druid.segment.ReferenceCountingSegment@48a7ea30, count=1}] not the right count[0]}

com.metamx.common.ISE: hydrant[FireHydrant{index=null, queryable=io.druid.segment.ReferenceCountingSegment@48a7ea30, count=1}] not the right count[0]

at io.druid.segment.realtime.plumber.Sink.(Sink.java:91) ~[druid-server-0.8.0-rc1.jar:0.8.0-rc1]

at io.druid.segment.realtime.plumber.RealtimePlumber.bootstrapSinksFromDisk(RealtimePlumber.java:652) ~[druid-server-0.8.0-rc1.jar:0.8.0-rc1]

at io.druid.segment.realtime.plumber.RealtimePlumber.startJob(RealtimePlumber.java:180) ~[druid-server-0.8.0-rc1.jar:0.8.0-rc1]

at io.druid.segment.realtime.RealtimeManager$FireChief.run(RealtimeManager.java:244) [druid-server-0.8.0-rc1.jar:0.8.0-rc1]

2015-09-04T14:09:01,256 INFO [chief-bid_stats[0]] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“alerts”,“timestamp”:“2015-09-04T14:09:01.251Z”,“service”:“realtime”,“host”:“xxx.xxx.xxx.xxx:8084”,“severity”:“component-failure”,“description”:“RuntimeException aborted realtime processing[bid_stats]”,“data”:{“class”:“io.druid.segment.realtime.RealtimeManager”,“exceptionType”:“com.metamx.common.ISE”,“exceptionMessage”:“hydrant[FireHydrant{index=null, queryable=io.druid.segment.ReferenceCountingSegment@48a7ea30, count=1}] not the right count[0]”,“exceptionStackTrace”:“com.metamx.common.ISE: hydrant[FireHydrant{index=null, queryable=io.druid.segment.ReferenceCountingSegment@48a7ea30, count=1}] not the right count[0]\n\tat io.druid.segment.realtime.plumber.Sink.(Sink.java:91)\n\tat io.druid.segment.realtime.plumber.RealtimePlumber.bootstrapSinksFromDisk(RealtimePlumber.java:652)\n\tat io.druid.segment.realtime.plumber.RealtimePlumber.startJob(RealtimePlumber.java:180)\n\tat io.druid.segment.realtime.RealtimeManager$FireChief.run(RealtimeManager.java:244)\n”}}]

Exception in thread “chief-bid_stats[0]” com.metamx.common.ISE: hydrant[FireHydrant{index=null, queryable=io.druid.segment.ReferenceCountingSegment@48a7ea30, count=1}] not the right count[0]

at io.druid.segment.realtime.plumber.Sink.(Sink.java:91)

at io.druid.segment.realtime.plumber.RealtimePlumber.bootstrapSinksFromDisk(RealtimePlumber.java:652)

at io.druid.segment.realtime.plumber.RealtimePlumber.startJob(RealtimePlumber.java:180)

at io.druid.segment.realtime.RealtimeManager$FireChief.run(RealtimeManager.java:244)

4.054: [GC4.054: [ParNew: 838912K->35907K(943744K), 0.0278460 secs] 838912K->35907K(10380928K), 0.0279240 secs] [Times: user=0.19 sys=0.01, real=0.03 secs]

Can this be the reason? If so, what am I doing wrong?

If not, what else could it? Where start debugging?

Thanks for your help.

I deleted all the segments in /tmp/realtime/basePersist/bid_stats and htis error dissappeared.

But now I got a new one and still have the same problem that no data seems to be stored in the realtime node:

2015-09-04T14:42:49,077 INFO [chief-bid_stats[0]] io.druid.segment.realtime.RealtimeManager - Firehose acquired!

2015-09-04T14:42:49,106 INFO [druid_d-bro-1441377768682-5ab22f2e-leader-finder-thread] kafka.client.ClientUtils$ - Fetching metadata from broker id:2,host:xxx.xxx.xxx.xxx,port:9092 with correlation id 0 for 1 topic(s) Set(bid_stats)

2015-09-04T14:42:49,132 INFO [druid_d-bro-1441377768682-5ab22f2e-leader-finder-thread] kafka.consumer.ConsumerFetcherManager - [ConsumerFetcherManager-1441377768724] Added fetcher for partitions ArrayBuffer([[bid_stats,0], initOffset 6412357311 to broker id:0,host:136.243.36.142,port:9092] , [[bid_stats,1], initOffset 121747 to broker id:0,host:xxx.xxx.xxx.xxx,port:9092] )

2015-09-04T14:42:49,158 ERROR [ConsumerFetcherThread-druid_d-bro-1441377768682-5ab22f2e-0-0] kafka.consumer.ConsumerFetcherThread - [ConsumerFetcherThread-druid_d-bro-1441377768682-5ab22f2e-0-0], Current offset 121747 for partition [bid_stats,1] out of range; reset offset to 52706909

Hi Roman,

That error seems more related to the fact that you may not have cleaned up segments correctly after a restart, causing Druid to go into a bit of a funky state. I’m curious what were the steps that you did to generate those exception logs. Also, what version of Druid is this? The second error you posted is a Kafka error, so you might have to ask them what is happening there.

In general, you should enable metrics for real-time ingestion, which will tell you why events are being rejected or otherwise not being ingested. Later stable versions of Druid should have these metrics emitted periodically in the logs.

– FJ