Deep storage HDFS change problem

Hi druid users,

I have problem at deep storage change(HDFS cluster change).

Test envs:

  • CentOS 6.6

  • Druid v0.9.0

  • Druid cluster nodes

  • server#1 : overlord, coordinator

  • server#2 : broker

  • server#3 : middleManager

  • server#4 : historical

  • server#5 : MySQL(metadata storage)

  • HDFS clusters

Test sequences:

  • Copy all druid segments data in current HDFS to new HDFS.

  • Change all configurations that related with HDFS(druid common.runtime.properties and etc).

  • Restart all druid nodes.

  • Chceck all druid nodes status.

  • Shutdown old HDFS.

  • Add new Historical node with new HDFS configuration.

  • Check segment rebalancing progresses.

I expected that coordinator will rebalance segment with information in new HDFS and after segment load to new place, coordinator will update metadata in druid_segments table with new segment related information.

But, it does not use new HDFS information and it use current metadata in druid_segments table’s payload column data for rebalancing.

So, segment rebalancing to new historical node was failed.(new histroical node segment load fail)

[new historical node error log sample]

in the log, ‘hdfs://lean07.lean.com:8020/user/root/druid/segments/syslog’ is old HDFS path.

2017-06-26T05:34:20,323 ERROR [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - Failed to load segment for dataSource: {class=io.druid.server.coordination.ZkCoordinator, exceptionType=class io.druid.segment.loading.SegmentLoadingException, exceptionMessage=Exception loading segment[syslog_2017-06-16T06:00:00.000Z_2017-06-16T07:00:00.000Z_2017-06-16T05:58:18.484Z_3], segment=DataSegment{size=345374710, shardSpec=LinearShardSpec{partitionNum=3}, metrics=[count], dimensions=[action, action_csct, action_rate, cause, cause_csct, conn_seq, dvc_type, event_code, ip, mac, model_name, scn], version=‘2017-06-16T05:58:18.484Z’, loadSpec={type=hdfs, path=hdfs://lean07.lean.com:8020/user/root/druid/segments/syslog/20170616T060000.000Z_20170616T070000.000Z/2017-06-16T05_58_18.484Z/3/index.zip}, interval=2017-06-16T06:00:00.000Z/2017-06-16T07:00:00.000Z, dataSource=‘syslog’, binaryVersion=‘9’}}

io.druid.segment.loading.SegmentLoadingException: Exception loading segment[syslog_2017-06-16T06:00:00.000Z_2017-06-16T07:00:00.000Z_2017-06-16T05:58:18.484Z_3]

at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:309) ~[druid-server-0.9.0.jar:0.9.0]

at io.druid.server.coordination.ZkCoordinator.addSegment(ZkCoordinator.java:350) [druid-server-0.9.0.jar:0.9.0]

at io.druid.server.coordination.SegmentChangeRequestLoad.go(SegmentChangeRequestLoad.java:44) [druid-server-0.9.0.jar:0.9.0]

at io.druid.server.coordination.ZkCoordinator$1.childEvent(ZkCoordinator.java:152) [druid-server-0.9.0.jar:0.9.0]

at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:518) [curator-recipes-2.9.1.jar:?]

at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:512) [curator-recipes-2.9.1.jar:?]

at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:92) [curator-framework-2.9.1.jar:?]

at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [guava-16.0.1.jar:?]

at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:83) [curator-framework-2.9.1.jar:?]

at org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:509) [curator-recipes-2.9.1.jar:?]

at org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35) [curator-recipes-2.9.1.jar:?]

at org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:766) [curator-recipes-2.9.1.jar:?]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:473) [?:1.7.0_141]

at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_141]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:473) [?:1.7.0_141]

at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_141]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_141]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_141]

at java.lang.Thread.run(Thread.java:748) [?:1.7.0_141]

Caused by: io.druid.segment.loading.SegmentLoadingException: Error loading [hdfs://lean07.lean.com:8020/user/root/druid/segments/syslog/20170616T060000.000Z_20170616T070000.000Z/2017-06-16T05_58_18.484Z/3/index.zip]

at io.druid.storage.hdfs.HdfsDataSegmentPuller.getSegmentFiles(HdfsDataSegmentPuller.java:284) ~[?:?]

at io.druid.storage.hdfs.HdfsLoadSpec.loadSegment(HdfsLoadSpec.java:62) ~[?:?]

at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegmentFiles(SegmentLoaderLocalCacheManager.java:143) ~[druid-server-0.9.0.jar:0.3.16]

at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:95) ~[druid-server-0.9.0.jar:0.3.16]

at io.druid.server.coordination.ServerManager.loadSegment(ServerManager.java:152) ~[druid-server-0.9.0.jar:0.9.0]

at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:305) ~[druid-server-0.9.0.jar:0.9.0]

… 18 more

Caused by: java.net.ConnectException: Call From lean14.lean.com/10.0.2.14 to lean07.lean.com:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.7.0_141]

at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) ~[?:1.7.0_141]

at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.7.0_141]

Question:

How can i change deep storage(HDFS) without failing on segment rebalancing?

Are there any recommended way to change HDFS?

I’ll appreciate at any advices.

Thank you.

Hi,
the coordinator does not automatically update the segment metadata info.

If you are moving the segments to new HDFS location, you will also need to manually update the payload in segment metadata entries in druid_segments table.

you can do this by executing an SQL command or you can also try using druid insert-segment tool (http://druid.io/docs/latest/operations/insert-segment-to-db.html) to create new segment metadata entries in a new table. Do remember to take a backup of metadata storage before making any changes to metadata storage for added safety.