<= 99& available datasource after successful ingestion task run

Hello,

I am trying to run the wikipedia tutorial and was able to successfully run the ingestion task. I then check the coordinator console localhost:8081 and the wikipedia datasouce never becomes available.

I am running the entire project locally and using the docker container currently published to the github repo:

Attached are the logs for the historical and coordinator services:

When looking through the logs of the historical node I see the following:

2019-04-10T03:15:49,154 ERROR [ZkCoordinator] org.apache.druid.server.coordination.SegmentLoadDropHandler - Failed to load segment for dataSource: {class=org.apache.druid.server.coordination.SegmentLoadDropHandler, exceptionType=class org.apache.druid.segment.loading.SegmentLoadingException, exceptionMessage=Exception loading segment[transform-tutorial_2018-01-01T00:00:00.000Z_2018-01-08T00:00:00.000Z_2019-04-09T04:46:05.229Z], segment=DataSegment{size=1821, shardSpec=NumberedShardSpec{partitionNum=0, partitions=0}, metrics=[count, number, triple-number], dimensions=[animal, location], version=‘2019-04-09T04:46:05.229Z’, loadSpec={type=>local, path=>/opt/apache-druid-0.15.0-incubating-SNAPSHOT/var/druid/segments/transform-tutorial/2018-01-01T00:00:00.000Z_2018-01-08T00:00:00.000Z/2019-04-09T04:46:05.229Z/0/index.zip}, interval=2018-01-01T00:00:00.000Z/2018-01-08T00:00:00.000Z, dataSource=‘transform-tutorial’, binaryVersion=‘9’}}
org.apache.druid.segment.loading.SegmentLoadingException: Exception loading segment[transform-tutorial_2018-01-01T00:00:00.000Z_2018-01-08T00:00:00.000Z_2019-04-09T04:46:05.229Z]
at org.apache.druid.server.coordination.SegmentLoadDropHandler.loadSegment(SegmentLoadDropHandler.java:268) ~[druid-server-0.15.0-incubating-SNAPSHOT.jar:0.15.0-incubating-SNAPSHOT]
at org.apache.druid.server.coordination.SegmentLoadDropHandler.addSegment(SegmentLoadDropHandler.java:312) [druid-server-0.15.0-incubating-SNAPSHOT.jar:0.15.0-incubating-SNAPSHOT]
at org.apache.druid.server.coordination.SegmentChangeRequestLoad.go(SegmentChangeRequestLoad.java:47) [druid-server-0.15.0-incubating-SNAPSHOT.jar:0.15.0-incubating-SNAPSHOT]
at org.apache.druid.server.coordination.ZkCoordinator$1.childEvent(ZkCoordinator.java:118) [druid-server-0.15.0-incubating-SNAPSHOT.jar:0.15.0-incubating-SNAPSHOT]
at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:538) [curator-recipes-4.1.0.jar:4.1.0]
at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:532) [curator-recipes-4.1.0.jar:4.1.0]
at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:93) [curator-framework-4.1.0.jar:4.1.0]
at org.apache.curator.shaded.com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:435) [curator-client-4.1.0.jar:?]
at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:85) [curator-framework-4.1.0.jar:4.1.0]
at org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:530) [curator-recipes-4.1.0.jar:4.1.0]
at org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35) [curator-recipes-4.1.0.jar:4.1.0]
at org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:808) [curator-recipes-4.1.0.jar:4.1.0]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_212]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_212]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_212]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_212]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_212]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_212]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
Caused by: java.lang.IllegalArgumentException: Instantiation of [simple type, class org.apache.druid.segment.loading.LocalLoadSpec] value failed: [/opt/apache-druid-0.15.0-incubating-SNAPSHOT/var/druid/segments/transform-tutorial/2018-01-01T00:00:00.000Z_2018-01-08T00:00:00.000Z/2019-04-09T04:46:05.229Z/0/index.zip] does not exist
at com.fasterxml.jackson.databind.ObjectMapper._convert(ObjectMapper.java:3459) ~[jackson-databind-2.6.7.jar:2.6.7]
at com.fasterxml.jackson.databind.ObjectMapper.convertValue(ObjectMapper.java:3378) ~[jackson-databind-2.6.7.jar:2.6.7]
at org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.loadInLocation(SegmentLoaderLocalCacheManager.java:235) ~[druid-server-0.15.0-incubating-SNAPSHOT.jar:0.15.0-incubating-SNAPSHOT]
at org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.loadInLocationWithStartMarker(SegmentLoaderLocalCacheManager.java:224) ~[druid-server-0.15.0-incubating-SNAPSHOT.jar:0.15.0-incubating-SNAPSHOT]

It looks like it’s configured to use “local” deep storage, but the docker setup is distributed (each druid process has its own container), so the published segments are not visible to other containers.

You’ll need to use a distributed deep storage (the environment is setup for azure by default) or adjust the docker containers to have a shared directory and use that as your “local” deep storage.

Hi jon, thanks for the reply.

How would i go about setting up the docker containers to have a shared directory?

I tried setting up the deep storage to local in the common.runtime.properties file as follows:

druid.storage.type=local
druid.storage.storageDirectory=/opt/druid/var/segments

My thinking was that the docker-compose.yml has /opt/druid/var as a volume across all of the druid nodes and that would work as the shared directory, but it did not work.

Is the segment index.zip supposed to be located across all of the druid containers? I only see it in the overlord container.

I am still having trouble setting up a shared volume between the containers. Any idea how I would set this up? I tried add a shared volume on the docker compose yml file but had not luck. Does it also need to be specified in the Dockerfile?

https://github.com/apache/incubator-druid/blob/master/distribution/docker/docker-compose.yml

I haven’t tried this myself, but you could try setting these values to point to the directories on your host that will be shared across containers:

volumes:
metadata_data: {}
middle_var: {}
historical_var: {}
broker_var: {}
coordinator_var: {}
overlord_var: {}