Error on Historical nodes, missing index.drd file

Hi,

I’ve a strange behavior in the last days in the Historical nodes:

I’ve exceptions about missing index.drd file.

My Druid version il 0.9.1.1

Can someone help me to understand this error and how to solve it.

Here some logs:

2017-02-19 22:38:09,884 INFO i.d.s.c.ZkCoordinator [ZkCoordinator-0] Completed request [LOAD: buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00]

2017-02-19 22:38:09,885 ERROR i.d.s.c.ZkCoordinator [ZkCoordinator-0] Failed to load segment for dataSource: {class=io.druid.server.coordination.ZkCoordinator, exceptionType=class io.druid.segment.loading.SegmentLoadingException, exceptionMessage=Exception loading segment[buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00], segment=DataSegment{size=643323273, shardSpec=LinearShardSpec{partitionNum=0}, metrics=[count, rtb_count, bid_response, wins, clicks, clicks_global, conversion, clickfraud_count, conversionfraud_count, bid_spent, min_bid, max_bid, min_win, events, bid_time_total, max_bid_time, min_bid_time, max_win, win_spent, company_spent, advertiser_spent, max_advertiser_bag, total_advertiser_bad, total_advertiser_transaction, revenue, peer39_price, track_revenue], dimensions=[account_executive, account_manager, account_onboarding, account_sales, advertiser_bag, advertiser_id, advertiser_price, advertiser_transaction, app_bundle, banner_size, bid_floor, bid_price, bid_time, browser_id, browser_version_id, campaign_id, campaign_type, carrier_id, categories, city_id, company_id, company_price, creative_id, country_id, currency, conversion_type, device_make_id, device_model_id, device_os_id, device_osv_id, device_type_id, domain, event_id, exchange_id, native_type, offer_id, optimization_id, optimization_status, place_id, place_name, pub_id, pub_name, publisher_id, publisher_src, publisher_src1, publisher_src2, publisher_src3, publisher_src4, publisher_src5, supply_type, region_id, rotator_id, rtb, sec_counter, smartlink_id, url_id, win_price], version=‘2017-02-19T20:00:00.000-05:00’, loadSpec={type=local, path=/s3/dataStorage/buck_bidding/2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00/2017-02-19T20:00:00.000-05:00/0/index.zip}, interval=2017-02-19T20:00:00.000-05:00/2017-02-19T21:00:00.000-05:00, dataSource=‘buck_bidding’, binaryVersion=‘9’}}

io.druid.segment.loading.SegmentLoadingException: Exception loading segment[buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00]

at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:309) ~[druid-server-0.9.1.1.jar:0.9.1.1]

at io.druid.server.coordination.ZkCoordinator.addSegment(ZkCoordinator.java:350) [druid-server-0.9.1.1.jar:0.9.1.1]

at io.druid.server.coordination.SegmentChangeRequestLoad.go(SegmentChangeRequestLoad.java:44) [druid-server-0.9.1.1.jar:0.9.1.1]

at io.druid.server.coordination.ZkCoordinator$1.childEvent(ZkCoordinator.java:152) [druid-server-0.9.1.1.jar:0.9.1.1]

at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:522) [curator-recipes-2.10.0.jar:?]

at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:516) [curator-recipes-2.10.0.jar:?]

at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:93) [curator-framework-2.10.0.jar:?]

at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [guava-16.0.1.jar:?]

at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:85) [curator-framework-2.10.0.jar:?]

at org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:514) [curator-recipes-2.10.0.jar:?]

at org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35) [curator-recipes-2.10.0.jar:?]

at org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:772) [curator-recipes-2.10.0.jar:?]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_73]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_73]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_73]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_73]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_73]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_73]

at java.lang.Thread.run(Thread.java:745) [?:1.8.0_73]

Caused by: io.druid.segment.loading.SegmentLoadingException: /druid/historical/indexCache/buck_bidding/2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00/2017-02-19T20:00:00.000-05:00/0/index.drd (No such file or directory)

at io.druid.segment.loading.MMappedQueryableIndexFactory.factorize(MMappedQueryableIndexFactory.java:52) ~[druid-server-0.9.1.1.jar:0.9.1.1]

at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:96) ~[druid-server-0.9.1.1.jar:0.9.1.1]

at io.druid.server.coordination.ServerManager.loadSegment(ServerManager.java:152) ~[druid-server-0.9.1.1.jar:0.9.1.1]

at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:305) ~[druid-server-0.9.1.1.jar:0.9.1.1]

… 18 more

Caused by: java.io.FileNotFoundException: /druid/historical/indexCache/buck_bidding/2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00/2017-02-19T20:00:00.000-05:00/0/index.drd (No such file or directory)

at java.io.FileInputStream.open0(Native Method) ~[?:1.8.0_73]

at java.io.FileInputStream.open(FileInputStream.java:195) ~[?:1.8.0_73]

at java.io.FileInputStream.(FileInputStream.java:138) ~[?:1.8.0_73]

at io.druid.segment.SegmentUtils.getVersionFromDir(SegmentUtils.java:43) ~[druid-api-0.9.1.1.jar:0.9.1.1]

at io.druid.segment.IndexIO.loadIndex(IndexIO.java:211) ~[druid-processing-0.9.1.1.jar:0.9.1.1]

at io.druid.segment.loading.MMappedQueryableIndexFactory.factorize(MMappedQueryableIndexFactory.java:49) ~[druid-server-0.9.1.1.jar:0.9.1.1]

at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:96) ~[druid-server-0.9.1.1.jar:0.9.1.1]

at io.druid.server.coordination.ServerManager.loadSegment(ServerManager.java:152) ~[druid-server-0.9.1.1.jar:0.9.1.1]

at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:305) ~[druid-server-0.9.1.1.jar:0.9.1.1]

… 18 more

2017-02-19 22:38:09,885 INFO i.d.s.c.ZkCoordinator [ZkCoordinator-0] zNode[/druid-aws/loadQueue/10.80.4.55:8081/buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00] was removed
2017-02-19 22:38:39,882 WARN i.d.s.c.ZkCoordinator [ZkCoordinator-Exec–0] Unable to delete segmentInfoCacheFile[/druid/historical/indexCache/info_dir/buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00]

``

Then after some minutes:

2017-02-19 22:42:44,743 DEBUG o.a.z.ClientCnxn [ZkCoordinator-0-SendThread(10.80.4.69:2181)] Reading reply sessionid:0x45870066b440773, packet:: clientPath:/druid-aws/loadQueue/10.80.4.55:8081 serverPath:/druid-aws/loadQueue/10.80.4.55:8081 finished:false header:: 3470,12 replyHeader:: 3470,107378822171,0 request:: '/druid-aws/loadQueue/10.80.4.55:8081,T response:: v{'buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00},s{107375333209,107375333209,1482155297346,1482155297346,0,71969,0,0,0,1,107378822171}

2017-02-19 22:42:44,744 DEBUG o.a.z.ClientCnxn [ZkCoordinator-0-SendThread(10.80.4.69:2181)] Reading reply sessionid:0x45870066b440773, packet:: clientPath:/druid-aws/loadQueue/10.80.4.55:8081/buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00 serverPath:/druid-aws/loadQueue/10.80.4.55:8081/buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00 finished:false header:: 3471,4 replyHeader:: 3471,107378822171,0 request:: '/druid-aws/loadQueue/10.80.4.55:8081/buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00,T response:: #7b22616374696f6e223a226c6f6164222c2264617461536f75726365223a226275636b5f62696464696e67222c22696e74657276616c223a22323031372d30322d31395432303a30303a30302e3030302d30353a30302f323031372d30322d31395432313a30303a30302e3030302d30353a3030222c2276657273696f6e223a22323031372d30322d31395432303a30303a30302e3030302d30353a3030222c226c6f616453706563223a7b2274797065223a226c6f63616c222c2270617468223a222f73332f6461746153746f726167652f6275636b5f62696464696e672f323031372d30322d31395432303a30303a30302e3030302d30353a30305f323031372d30322d31395432313a30303a30302e3030302d30353a30302f323031372d30322d31395432303a30303a30302e3030302d30353a30302f302f696e6465782e7a6970227d2c2264696d656e73696f6e73223a226163636f756e745f6578656375746976652c6163636f756e745f6d616e616765722c6163636f756e745f6f6e626f617264696e672c6163636f756e745f73616c65732c616476657274697365725f6261672c616476657274697365725f69642c616476657274697365725f70726963652c616476657274697365725f7472616e73616374696f6e2c6170705f62756e646c652c62616e6e65725f73697a652c6269645f666c6f6f722c6269645f70726963652c6269645f74696d652c62726f777365725f69642c62726f777365725f76657273696f6e5f69642c63616d706169676e5f69642c63616d706169676e5f747970652c636172726965725f69642c63617465676f726965732c636974795f69642c636f6d70616e795f69642c636f6d70616e795f70726963652c63726561746976655f69642c636f756e7472795f69642c63757272656e63792c636f6e76657273696f6e5f747970652c6465766963655f6d616b655f69642c6465766963655f6d6f64656c5f69642c6465766963655f6f735f69642c6465766963655f6f73765f69642c6465766963655f747970655f69642c646f6d61696e2c6576656e745f69642c65786368616e67655f69642c6e61746976655f747970652c6f666665725f69642c6f7074696d697a6174696f6e5f69642c6f7074696d697a6174696f6e5f7374617475732c706c6163655f69642c706c6163655f6e616d652c7075625f69642c7075625f6e616d652c7075626c69736865725f69642c7075626c69736865725f7372632c7075626c69736865725f737263312c7075626c69736865725f737263322c7075626c69736865725f737263332c7075626c69736865725f737263342c7075626c69736865725f737263352c737570706c795f747970652c726567696f6e5f69642c726f7461746f725f69642c7274622c7365635f636f756e7465722c736d6172746c696e6b5f69642c75726c5f69642c77696e5f7072696365222c226d657472696373223a22636f756e742c7274625f636f756e742c6269645f726573706f6e73652c77696e732c636c69636b732c636c69636b735f676c6f62616c2c636f6e76657273696f6e2c636c69636b66726175645f636f756e742c636f6e76657273696f6e66726175645f636f756e742c6269645f7370656e742c6d696e5f6269642c6d61785f6269642c6d696e5f77696e2c6576656e74732c6269645f74696d655f746f74616c2c6d61785f6269645f74696d652c6d696e5f6269645f74696d652c6d61785f77696e2c77696e5f7370656e742c636f6d70616e795f7370656e742c616476657274697365725f7370656e742c6d61785f616476657274697365725f6261672c746f74616c5f616476657274697365725f6261642c746f74616c5f616476657274697365725f7472616e73616374696f6e2c726576656e75652c7065657233395f70726963652c747261636b5f726576656e7565222c22736861726453706563223a7b2274797065223a226c696e656172222c22706172746974696f6e4e756d223a307d2c2262696e61727956657273696f6e223a392c2273697a65223a3634333332333237332c226964656e746966696572223a226275636b5f62696464696e675f323031372d30322d31395432303a30303a30302e3030302d30353a30305f323031372d30322d31395432313a30303a30302e3030302d30353a30305f323031372d30322d31395432303a30303a30302e3030302d30353a3030227d,s{107378822171,107378822171,1487562164742,1487562164742,0,0,0,457238547567738880,1617,0,107378822171}

2017-02-19 22:42:44,744 INFO i.d.s.c.ZkCoordinator [ZkCoordinator-0] New request[LOAD: buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00] with zNode[/druid-aws/loadQueue/10.80.4.55:8081/buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00].

2017-02-19 22:42:44,744 INFO i.d.s.c.ZkCoordinator [ZkCoordinator-0] Loading segment buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00

2017-02-19 22:42:54,905 WARN i.d.s.l.SegmentLoaderLocalCacheManager [ZkCoordinator-0] Segment [buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00] is different than expected size. Expected [643323273] found [642701534]

2017-02-19 22:42:54,908 INFO i.d.s.c.BatchDataSegmentAnnouncer [ZkCoordinator-0] Announcing segment[buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00] at existing path[/druid-aws/segments/10.80.4.55:8081/10.80.4.55:8081_historical__default_tier_2017-01-10T10:36:21.934-05:00_5a9a3cf46507491bbf5b3a5ddadb99ca489]

2017-02-19 22:42:54,913 DEBUG o.a.z.ClientCnxn [ZkCoordinator-0-SendThread(10.80.4.69:2181)] Got WatchedEvent state:SyncConnected type:NodeDeleted path:/druid-aws/loadQueue/10.80.4.55:8081/buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00 for sessionid 0x45870066b440773

2017-02-19 22:42:54,913 DEBUG o.a.z.ClientCnxn [ZkCoordinator-0-SendThread(10.80.4.69:2181)] Reading reply sessionid:0x45870066b440773, packet:: clientPath:null serverPath:null finished:false header:: 3473,2 replyHeader:: 3473,107378822620,0 request:: '/druid-aws/loadQueue/10.80.4.55:8081/buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00,-1 response:: null

2017-02-19 22:42:54,913 INFO i.d.s.c.ZkCoordinator [ZkCoordinator-0] Completed request [LOAD: buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00]

2017-02-19 22:42:54,913 INFO i.d.s.c.ZkCoordinator [ZkCoordinator-0] zNode[/druid-aws/loadQueue/10.80.4.55:8081/buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00] was removed

``

and the folder has:

[root@aws-druid-history18 logs]# ls -la /druid/historical/indexCache/buck_bidding/2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00/2017-02-19T20:00:00.000-05:00/0/

total 627656

drwxr-xr-x 2 root root 4096 Feb 19 22:42 .

drwxr-xr-x 3 root root 4096 Feb 19 22:42 …

-rw-r–r-- 1 root root 642698582 Feb 19 22:42 00000.smoosh

-rw-r–r-- 1 root root 2948 Feb 19 22:42 meta.smoosh

-rw-r–r-- 1 root root 4 Feb 19 22:42 version.bin

``

Thanks,

Maurizio

Hi guys,
any help on this?

Thanks

Maurizio

looking at the WARN message in the logs it looks like your segment got corrupt somehow. Can you reindex data for this interval and retry ?
2017-02-19 22:42:54,905 WARN i.d.s.l.SegmentLoaderLocalCacheManager [ZkCoordinator-0] Segment [buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00] is different than expected size. Expected [643323273] found [642701534]

Hi Nishant,
thanks for the feedback, I’m trying to reindex the segment.

Could the reason of different size be related to the fact that I’ve two Realtime nodes that do the same job ingesting from Kafka the same datasources and partitions at the same time?

This is a configuration that I’ve since two years and never received any kind of error. A few weeks ago I’ve updated to 0.9.1.1 version.

In the realtime spec file what I’ve for each datasource is:

“shardSpec”: {

“type”: “linear”,

“partitionNum”: 0

}

Thanks

Maurizio