Deleted a segment from hdfs directly

I deleted a segment from our hdfs directly. Now my historical node won’t start up. Looks like it is still trying to load the missing segment. I also went into mysql and deleted the record for that segment. The historical node is still trying to load the missing segment. How do I recover from this?

Hi Amol,

Could you please provide some log? Did you also cleanup the cache directory of historical?

Hi Bingkun,

Where is this cache directory of historical?

Here is the log from historical node when it tries to save a record handed off by realtime.

2015-11-20T17:56:21,303 ERROR [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - Failed to load segment for dataSource: {class=io.druid.server.coordination.ZkCoordinator, exceptionType=class io.druid.segment.loading.SegmentLoadingException, exceptionMessage=Exception loading segment[wikipedia_2013-08-31T00:00:00.000Z_2013-09-01T00:00:00.000Z_2015-11-13T19:23:45.354Z], segment=DataSegment{size=9852, shardSpec=NoneShardSpec, metrics=[count, added, deleted, delta], dimensions=[anonymous, city, continent, country, language, namespace, newPage, page, region, robot, unpatrolled, user], version=‘2015-11-13T19:23:45.354Z’, loadSpec={type=hdfs, path=/AAA/druid/segments/wikipedia/20130831T000000.000Z_20130901T000000.000Z/2015-11-13T19_23_45.354Z/0/index.zip}, interval=2013-08-31T00:00:00.000Z/2013-09-01T00:00:00.000Z, dataSource=‘wikipedia’, binaryVersion=‘9’}}

io.druid.segment.loading.SegmentLoadingException: Exception loading segment[wikipedia_2013-08-31T00:00:00.000Z_2013-09-01T00:00:00.000Z_2015-11-13T19:23:45.354Z]

at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:146) ~[druid-server-0.8.1-iap2.jar:0.8.1-iap2]

at io.druid.server.coordination.ZkCoordinator.addSegment(ZkCoordinator.java:171) [druid-server-0.8.1-iap2.jar:0.8.1-iap2]

at io.druid.server.coordination.SegmentChangeRequestLoad.go(SegmentChangeRequestLoad.java:42) [druid-server-0.8.1-iap2.jar:0.8.1-iap2]

at io.druid.server.coordination.BaseZkCoordinator$1.childEvent(BaseZkCoordinator.java:115) [druid-server-0.8.1-iap2.jar:0.8.1-iap2]

at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:516) [curator-recipes-2.8.0.jar:?]

at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:510) [curator-recipes-2.8.0.jar:?]

at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:92) [curator-framework-2.8.0.jar:?]

at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [guava-16.0.1.jar:?]

at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:84) [curator-framework-2.8.0.jar:?]

at org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:508) [curator-recipes-2.8.0.jar:?]

at org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35) [curator-recipes-2.8.0.jar:?]

at org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:759) [curator-recipes-2.8.0.jar:?]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [?:1.7.0_05]

at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [?:1.7.0_05]

at java.util.concurrent.FutureTask.run(FutureTask.java:166) [?:1.7.0_05]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [?:1.7.0_05]

at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [?:1.7.0_05]

at java.util.concurrent.FutureTask.run(FutureTask.java:166) [?:1.7.0_05]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [?:1.7.0_05]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [?:1.7.0_05]

at java.lang.Thread.run(Thread.java:722) [?:1.7.0_05]

Caused by: io.druid.segment.loading.SegmentLoadingException: /tmp/druid/indexCache/wikipedia/2013-08-31T00:00:00.000Z_2013-09-01T00:00:00.000Z/2015-11-13T19:23:45.354Z/0/index.drd (No such file or directory)

at io.druid.segment.loading.MMappedQueryableIndexFactory.factorize(MMappedQueryableIndexFactory.java:40) ~[druid-server-0.8.1-iap2.jar:0.8.1-iap2]

at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:94) ~[druid-server-0.8.1-iap2.jar:0.8.1-iap2]

at io.druid.server.coordination.ServerManager.loadSegment(ServerManager.java:151) ~[druid-server-0.8.1-iap2.jar:0.8.1-iap2]

at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:142) ~[druid-server-0.8.1-iap2.jar:0.8.1-iap2]

… 20 more

Caused by: java.io.FileNotFoundException: /tmp/druid/indexCache/wikipedia/2013-08-31T00:00:00.000Z_2013-09-01T00:00:00.000Z/2015-11-13T19:23:45.354Z/0/index.drd (No such file or directory)

at java.io.FileInputStream.open(Native Method) ~[?:1.7.0_05]

at java.io.FileInputStream.(FileInputStream.java:138) ~[?:1.7.0_05]

at io.druid.segment.SegmentUtils.getVersionFromDir(SegmentUtils.java:24) ~[druid-api-0.3.9.jar:0.8.1-iap2]

at io.druid.segment.IndexIO.loadIndex(IndexIO.java:165) ~[druid-processing-0.8.1-iap2.jar:0.8.1-iap2]

at io.druid.segment.loading.MMappedQueryableIndexFactory.factorize(MMappedQueryableIndexFactory.java:37) ~[druid-server-0.8.1-iap2.jar:0.8.1-iap2]

at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:94) ~[druid-server-0.8.1-iap2.jar:0.8.1-iap2]

at io.druid.server.coordination.ServerManager.loadSegment(ServerManager.java:151) ~[druid-server-0.8.1-iap2.jar:0.8.1-iap2]

at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:142) ~[druid-server-0.8.1-iap2.jar:0.8.1-iap2]

… 20 more

2015-11-20T17:56:21,304 INFO [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - zNode[/druid/loadQueue/lpdbd0036.phx.aexp.com:8083/wikipedia_2013-08-31T00:00:00.000Z_2013-09-01T00:00:00.000Z_2015-11-13T19:23:45.354Z] was removed

2015-11-20T17:56:51,267 INFO [ZkCoordinator-Exec–0] io.druid.server.coordination.ServerManager - Told to delete a queryable for a dataSource[wikipedia] that doesn’t exist.

2015-11-20T17:56:51,267 WARN [ZkCoordinator-Exec–0] io.druid.server.coordination.ZkCoordinator - Unable to delete segmentInfoCacheFile[/tmp/druid/indexCache/info_dir/wikipedia_2013-08-31T00:00:00.000Z_2013-09-01T00:00:00.000Z_2015-11-13T19:23:45.354Z]

2015-11-20T17:57:21,282 INFO [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - New request[LOAD: wikipedia_2013-08-31T00:00:00.000Z_2013-09-01T00:00:00.000Z_2015-11-13T19:23:45.354Z] with zNode[/druid/loadQueue/lpdbd0036.phx.aexp.com:8083/wikipedia_2013-08-31T00:00:00.000Z_2013-09-01T00:00:00.000Z_2015-11-13T19:23:45.354Z].

2015-11-20T17:57:21,282 INFO [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - Loading segment wikipedia_2013-08-31T00:00:00.000Z_2013-09-01T00:00:00.000Z_2015-11-13T19:23:45.354Z

2015-11-20T17:57:21,285 WARN [ZkCoordinator-0] com.metamx.common.RetryUtils - Failed on try 1, retrying in 2,044ms.

java.io.FileNotFoundException: File /AAA/druid/segments/wikipedia/20130831T000000.000Z_20130901T000000.000Z/2015-11-13T19_23_45.354Z/0/index.zip does not exist

at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511) ~[?:?]

at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:722) ~[?:?]

at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501) ~[?:?]

at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398) ~[?:?]

at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:137) ~[?:?]

at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339) ~[?:?]

at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:765) ~[?:?]

at io.druid.storage.hdfs.HdfsDataSegmentPuller$1.openInputStream(HdfsDataSegmentPuller.java:108) ~[?:?]

at io.druid.storage.hdfs.HdfsDataSegmentPuller.getInputStream(HdfsDataSegmentPuller.java:299) ~[?:?]

at io.druid.storage.hdfs.HdfsDataSegmentPuller$3.openStream(HdfsDataSegmentPuller.java:242) ~[?:?]

at com.metamx.common.CompressionUtils$1.call(CompressionUtils.java:136) ~[java-util-0.27.0.jar:?]

at com.metamx.common.CompressionUtils$1.call(CompressionUtils.java:132) ~[java-util-0.27.0.jar:?]

at com.metamx.common.RetryUtils.retry(RetryUtils.java:38) [java-util-0.27.0.jar:?]

at com.metamx.common.CompressionUtils.unzip(CompressionUtils.java:130) [java-util-0.27.0.jar:?]

at io.druid.storage.hdfs.HdfsDataSegmentPuller.getSegmentFiles(HdfsDataSegmentPuller.java:236) [druid-hdfs-storage-0.8.1-iap2.jar:0.8.1-iap2]

at io.druid.storage.hdfs.HdfsLoadSpec.loadSegment(HdfsLoadSpec.java:59) [druid-hdfs-storage-0.8.1-iap2.jar:0.8.1-iap2]

at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegmentFiles(SegmentLoaderLocalCacheManager.java:141) [druid-server-0.8.1-iap2.jar:0.8.1-iap2]

at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:93) [druid-server-0.8.1-iap2.jar:0.8.1-iap2]

at io.druid.server.coordination.ServerManager.loadSegment(ServerManager.java:151) [druid-server-0.8.1-iap2.jar:0.8.1-iap2]

at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:142) [druid-server-0.8.1-iap2.jar:0.8.1-iap2]

at io.druid.server.coordination.ZkCoordinator.addSegment(ZkCoordinator.java:171) [druid-server-0.8.1-iap2.jar:0.8.1-iap2]

at io.druid.server.coordination.SegmentChangeRequestLoad.go(SegmentChangeRequestLoad.java:42) [druid-server-0.8.1-iap2.jar:0.8.1-iap2]

at io.druid.server.coordination.BaseZkCoordinator$1.childEvent(BaseZkCoordinator.java:115) [druid-server-0.8.1-iap2.jar:0.8.1-iap2]

at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:516) [curator-recipes-2.8.0.jar:?]

at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:510) [curator-recipes-2.8.0.jar:?]

at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:92) [curator-framework-2.8.0.jar:?]

at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [guava-16.0.1.jar:?]

at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:84) [curator-framework-2.8.0.jar:?]

at org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:508) [curator-recipes-2.8.0.jar:?]

at org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35) [curator-recipes-2.8.0.jar:?]

at org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:759) [curator-recipes-2.8.0.jar:?]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [?:1.7.0_05]

at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [?:1.7.0_05]

at java.util.concurrent.FutureTask.run(FutureTask.java:166) [?:1.7.0_05]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [?:1.7.0_05]

at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [?:1.7.0_05]

at java.util.concurrent.FutureTask.run(FutureTask.java:166) [?:1.7.0_05]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [?:1.7.0_05]

413,1-8 77%

Hi Amol,

Are you sure this segment “wikipedia_2013-08-31T00:00:00.000Z_2013-09-01T00:00:00.000Z_2015-11-13T19:23:45.354Z” is removed from mysql?

It’s also possible that your zookeeper has stale information under “/druid/loadQueue/”, can you try restart zookeeper?

From your log, your cache directory is “/tmp/druid/indexCache/info_dir/”, more details about historical cache can be found at http://druid.io/docs/latest/configuration/historical.html#storing-segments.

Thanks. Clearing the local cache and all the nodes from zookeeper worked for us.

You can also look into kill tasks if you run the indexing service to automate hard deleting segments from HDFS:

http://druid.io/docs/latest/misc/tasks.html