insert-segment-to-db generated loadSpec path is relative, segments not loaded by druid

Hi

We use imply 1.0.2

We populated postgresql segments table using insert-segment-to-db. The difference we see from the earlier metadata in segments table, is that loadSpec (after hex decode) changed from

“loadSpec”:{“type”:“hdfs”,“path”:“hdfs://master-7e09d585.node.rsa:9000/data/druid/segments/…/0/index.zip”}

to

“loadSpec”:{“type”:“hdfs”,“path”:"/data/druid/segments/…/0/index.zip"}

When druid started after this, it failed with error:

java.io.FileNotFoundException: File /data/druid/segments/datasource1/20160816T083000.000Z_20160816T084500.000Z/2016-08-16T09_06_16.511Z/0/index.zip does not exist

Is there a way we can either generate the full hdfs URL from the tool or make it load segments from generated relative URL?

The command used to insert segments was:

java \

-Ddruid.metadata.storage.type=postgresql \

-Ddruid.metadata.storage.connector.connectURI=jdbc:postgresql://$POSTGRESQL_HOST:$POSTGRESQL_PORT/druid \

-Ddruid.metadata.storage.connector.user=$DRUID_USER \

-Ddruid.metadata.storage.connector.password=$DRUID_PASSWORD \

-Ddruid.extensions.loadList=[“postgresql-metadata-storage”,“druid-hdfs-storage”] \

-Ddruid.storage.type=hdfs \

-classpath ‘/root/imply-1.2.0/dist/druid/lib/*’ \

io.druid.cli.Main tools insert-segment-to-db --workingDir hdfs://$HDFS_HOST:$HDFS_PORT//data/druid/segments

Thanks

Ashish

Following is full stack-trace. It looks like druid is looking for file in local file system and not in HDFS

java.io.FileNotFoundException: File /data/druid/segments/sessionsSummary/20160816T084500.000Z_20160816T090000.000Z/2016-08-16T09_06_18.360Z/0/index.zip does not exist

    at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(**RawLocalFileSystem**.java:511) ~[?:?]

    at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:722) ~[?:?]

    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501) ~[?:?]

    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398) ~[?:?]

    at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:137) ~[?:?]

    at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339) ~[?:?]

    at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:765) ~[?:?]

    at io.druid.storage.hdfs.HdfsDataSegmentPuller$1.openInputStream(HdfsDataSegmentPuller.java:107) ~[?:?]

    at io.druid.storage.hdfs.HdfsDataSegmentPuller.getInputStream(HdfsDataSegmentPuller.java:298) ~[?:?]

    at io.druid.storage.hdfs.HdfsDataSegmentPuller$3.openStream(HdfsDataSegmentPuller.java:241) ~[?:?]

    at com.metamx.common.CompressionUtils$1.call(CompressionUtils.java:138) ~[java-util-0.27.7.jar:?]

    at com.metamx.common.CompressionUtils$1.call(CompressionUtils.java:134) ~[java-util-0.27.7.jar:?]

    at com.metamx.common.RetryUtils.retry(RetryUtils.java:38) [java-util-0.27.7.jar:?]

    at com.metamx.common.CompressionUtils.unzip(CompressionUtils.java:132) [java-util-0.27.7.jar:?]

    at io.druid.storage.hdfs.HdfsDataSegmentPuller.getSegmentFiles(HdfsDataSegmentPuller.java:235) [druid-hdfs-storage-0.9.0.jar:0.9.0]

    at io.druid.storage.hdfs.HdfsLoadSpec.loadSegment(HdfsLoadSpec.java:62) [druid-hdfs-storage-0.9.0.jar:0.9.0]

    at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegmentFiles(SegmentLoaderLocalCacheManager.java:143) [druid-server-0.9.0.jar:0.3.16]

    at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:95) [druid-server-0.9.0.jar:0.3.16]

    at io.druid.server.coordination.ServerManager.loadSegment(ServerManager.java:152) [druid-server-0.9.0.jar:0.9.0]

    at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:305) [druid-server-0.9.0.jar:0.9.0]

    at io.druid.server.coordination.ZkCoordinator.addSegment(ZkCoordinator.java:350) [druid-server-0.9.0.jar:0.9.0]

    at io.druid.server.coordination.SegmentChangeRequestLoad.go(SegmentChangeRequestLoad.java:44) [druid-server-0.9.0.jar:0.9.0]

    at io.druid.server.coordination.ZkCoordinator$1.childEvent(ZkCoordinator.java:152) [druid-server-0.9.0.jar:0.9.0]

    at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:518) [curator-recipes-2.9.1.jar:?]

    at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:512) [curator-recipes-2.9.1.jar:?]

    at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:92) [curator-framework-2.9.1.jar:?]

    at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [guava-16.0.1.jar:?]

    at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:83) [curator-framework-2.9.1.jar:?]

    at org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:509) [curator-recipes-2.9.1.jar:?]

    at org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35) [curator-recipes-2.9.1.jar:?]

    at org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:766) [curator-recipes-2.9.1.jar:?]

    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_74]

    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_74]

    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_74]

    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_74]

    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_74]

    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_74]

    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_74]

Problem was resolved by adding core-site.xml at conf/druid/_common/, as mentioned in http://druid.io/docs/latest/tutorials/cluster.html

Looks like class io.druid.storage.hdfs.HdfsDataSegmentFinder changes the absolute path to relative, when the tool executes.

While loading the segments using relative path, io.druid.storage.hdfs.HdfsDataSegmentPuller.getSegmentFiles calls path.getFileSystem(config), which (in absence of “fs.defaultFS” in Hadoop conf, in org.apache.hadoop.fs.FileSystem) assumes it to be a file on local disc.

Ideally insert-segment-to-db tool should write absolute url, to be consistent with the path inserted by indexing.

Also at read time, if loadSpec type is “hdfs” and path is a relative url, then instead of assuming local file, druid.storage.storageDirectory path can be used to locate file.

Hi Ashish, do you mind helping to improve the documentation for others that use this tool?

You can update the docs here: https://github.com/druid-io/druid/blob/master/docs/content/operations/insert-segment-to-db.md

sure, will do that

Doc update submitted, please check