Historical node cant load segments

Hi,

I need help with a problem I am facing in my 2 node cluster with overlord running in local mode.

I am unable to get any segments to be handoff-ed to Historical nodes.

I am doing realtime ingestion using a realtime task with kafka-8.

I have read the faqs at: http://druid.io/docs/latest/ingestion/faq.html and couldn’t match any scenarios.

I am attaching the task payload and configs for all the nodes.

The topology is as follows:

Machines

M1: 10.1.0.167

M2: 10.1.0.166

M1:

overlord(local)

zookeeper

metadata storage(postgresql)

M2:

historical

broker

coordinator

The

The historical node log continues to log this pattern:

2016-03-08T12:55:35,288 INFO [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - zNode[/druid/loadQueue/10.1.0.166:8083/saurabhtopicds3_2016-03-08T11:50:00.000Z_2016-03-08T12:00:00.000Z_2016-03-08T11:50:17.435Z] was removed

2016-03-08T12:56:05,186 INFO [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - New request[LOAD: saurabhtopicds3_2016-03-08T11:50:00.000Z_2016-03-08T12:00:00.000Z_2016-03-08T11:50:17.435Z] with zNode[/druid/loadQueue/10.1.0.166:8083/saurabhtopicds3_2016-03-08T11:50:00.000Z_2016-03-08T12:00:00.000Z_2016-03-08T11:50:17.435Z].

2016-03-08T12:56:05,186 INFO [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - Loading segment saurabhtopicds3_2016-03-08T11:50:00.000Z_2016-03-08T12:00:00.000Z_2016-03-08T11:50:17.435Z

2016-03-08T12:56:05,186 INFO [ZkCoordinator-0] io.druid.storage.s3.S3DataSegmentPuller - Pulling index at path[s3://my-s3-bucket/druid/segments/saurabhtopicds3/2016-03-08T11:50:00.000Z_2016-03-08T12:00:00.000Z/2016-03-08T11:50:17.435Z/0/index.zip] to outDir[var/druid/segment-cache/saurabhtopicds3/2016-03-08T11:50:00.000Z_2016-03-08T12:00:00.000Z/2016-03-08T11:50:17.435Z/0]

2016-03-08T12:56:05,361 INFO [ZkCoordinator-0] io.druid.segment.loading.SegmentLoaderLocalCacheManager - Asked to cleanup something[DataSegment{size=308757419, shardSpec=NoneShardSpec, metrics=[count], dimensions=[dim1, dim2, dim3, dim4, dim5, dim6, dim7, dim8, dim9], version=‘2016-03-08T11:50:17.435Z’, loadSpec={type=s3_zip, bucket=my-s3-bucket, key=druid/segments/saurabhtopicds3/2016-03-08T11:50:00.000Z_2016-03-08T12:00:00.000Z/2016-03-08T11:50:17.435Z/0/index.zip}, interval=2016-03-08T11:50:00.000Z/2016-03-08T12:00:00.000Z, dataSource=‘saurabhtopicds3’, binaryVersion=‘9’}] that didn’t exist. Skipping.

2016-03-08T12:56:05,361 WARN [ZkCoordinator-0] io.druid.server.coordination.BatchDataSegmentAnnouncer - No path to unannounce segment[saurabhtopicds3_2016-03-08T11:50:00.000Z_2016-03-08T12:00:00.000Z_2016-03-08T11:50:17.435Z]

2016-03-08T12:56:05,361 INFO [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - Completely removing [saurabhtopicds3_2016-03-08T11:50:00.000Z_2016-03-08T12:00:00.000Z_2016-03-08T11:50:17.435Z] in [30,000] millis

2016-03-08T12:56:05,363 INFO [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - Completed request [LOAD: saurabhtopicds3_2016-03-08T11:50:00.000Z_2016-03-08T12:00:00.000Z_2016-03-08T11:50:17.435Z]

2016-03-08T12:56:05,363 ERROR [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - Failed to load segment for dataSource: {class=io.druid.server.coordination.ZkCoordinator, exceptionType=class io.druid.segment.loading.SegmentLoadingException, exceptionMessage=Exception loading segment[saurabhtopicds3_2016-03-08T11:50:00.000Z_2016-03-08T12:00:00.000Z_2016-03-08T11:50:17.435Z], segment=DataSegment{size=308757419, shardSpec=NoneShardSpec, metrics=[count], dimensions=[dim1, dim2, dim3, dim4, dim5, dim6, dim7, dim8, dim9], version=‘2016-03-08T11:50:17.435Z’, loadSpec={type=s3_zip, bucket=my-s3-bucket, key=druid/segments/saurabhtopicds3/2016-03-08T11:50:00.000Z_2016-03-08T12:00:00.000Z/2016-03-08T11:50:17.435Z/0/index.zip}, interval=2016-03-08T11:50:00.000Z/2016-03-08T12:00:00.000Z, dataSource=‘saurabhtopicds3’, binaryVersion=‘9’}}

io.druid.segment.loading.SegmentLoadingException: Exception loading segment[saurabhtopicds3_2016-03-08T11:50:00.000Z_2016-03-08T12:00:00.000Z_2016-03-08T11:50:17.435Z]

at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:309) ~[druid-server-0.9.0-rc1.jar:0.9.0-rc1]

at io.druid.server.coordination.ZkCoordinator.addSegment(ZkCoordinator.java:350) [druid-server-0.9.0-rc1.jar:0.9.0-rc1]

at io.druid.server.coordination.SegmentChangeRequestLoad.go(SegmentChangeRequestLoad.java:44) [druid-server-0.9.0-rc1.jar:0.9.0-rc1]

at io.druid.server.coordination.ZkCoordinator$1.childEvent(ZkCoordinator.java:152) [druid-server-0.9.0-rc1.jar:0.9.0-rc1]

at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:518) [curator-recipes-2.9.1.jar:?]

at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:512) [curator-recipes-2.9.1.jar:?]

at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:92) [curator-framework-2.9.1.jar:?]

at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [guava-16.0.1.jar:?]

at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:83) [curator-framework-2.9.1.jar:?]

at org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:509) [curator-recipes-2.9.1.jar:?]

at org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35) [curator-recipes-2.9.1.jar:?]

at org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:766) [curator-recipes-2.9.1.jar:?]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [?:1.7.0_80]

at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_80]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [?:1.7.0_80]

at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_80]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_80]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_80]

at java.lang.Thread.run(Thread.java:745) [?:1.7.0_80]

Caused by: io.druid.segment.loading.SegmentLoadingException: No such file or directory

at io.druid.storage.s3.S3DataSegmentPuller.getSegmentFiles(S3DataSegmentPuller.java:241) ~[?:?]

at io.druid.storage.s3.S3LoadSpec.loadSegment(S3LoadSpec.java:62) ~[?:?]

at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegmentFiles(SegmentLoaderLocalCacheManager.java:143) ~[druid-server-0.9.0-rc1.jar:0.3.16]

at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:95) ~[druid-server-0.9.0-rc1.jar:0.3.16]

at io.druid.server.coordination.ServerManager.loadSegment(ServerManager.java:152) ~[druid-server-0.9.0-rc1.jar:0.9.0-rc1]

at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:305) ~[druid-server-0.9.0-rc1.jar:0.9.0-rc1]

… 18 more

Caused by: java.io.IOException: No such file or directory

at java.io.UnixFileSystem.createFileExclusively(Native Method) ~[?:1.7.0_80]

at java.io.File.createTempFile(File.java:2001) ~[?:1.7.0_80]

at java.io.File.createTempFile(File.java:2047) ~[?:1.7.0_80]

at com.metamx.common.CompressionUtils.unzip(CompressionUtils.java:149) ~[java-util-0.27.7.jar:?]

at io.druid.storage.s3.S3DataSegmentPuller.getSegmentFiles(S3DataSegmentPuller.java:210) ~[?:?]

at io.druid.storage.s3.S3LoadSpec.loadSegment(S3LoadSpec.java:62) ~[?:?]

at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegmentFiles(SegmentLoaderLocalCacheManager.java:143) ~[druid-server-0.9.0-rc1.jar:0.3.16]

at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:95) ~[druid-server-0.9.0-rc1.jar:0.3.16]

at io.druid.server.coordination.ServerManager.loadSegment(ServerManager.java:152) ~[druid-server-0.9.0-rc1.jar:0.9.0-rc1]

at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:305) ~[druid-server-0.9.0-rc1.jar:0.9.0-rc1]

… 18 more

2016-03-08T12:56:05,368 INFO [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - zNode[/druid/loadQueue/10.1.0.166:8083/saurabhtopicds3_2016-03-08T11:50:00.000Z_2016-03-08T12:00:00.000Z_2016-03-08T11:50:17.435Z] was removed

broker.properties (401 Bytes)

common.properties (936 Bytes)

cordinator.properties (131 Bytes)

historical.properties (448 Bytes)

overlord.properties (393 Bytes)

realtimetask.json (1.68 KB)

please make sure that you have free space at tmp directory or try to set it to use something out of /tmp

-Djava.io.tmpdir=<a path> Various parts of the system
that interact with the file system do it via temporary files, and these
files can get somewhat large. Many production systems are set up to have small (but fast) /tmp directories, which can be problematic with Druid so we recommend pointing the JVM’s tmp directory to something with a little more meat.

http://druid.io/docs/latest/configuration/

I have provided an explicit path for tmpdir which has ample space, so I dont think that is a problem. What do you make of the logs?

You are getting this exception (see in line), which means you either have a typo in the supplied tmp path. What are you providing as parameters to the JVM ?

at java.io.UnixFileSystem.createFileExclusively(Native Method) ~[?:1.7.0_80]

at java.io.File.createTempFile(File.java:2001) ~[?:1.7.0_80]

at java.io.File.createTempFile(File.java:2047) ~[?:1.7.0_80]

at com.metamx.common.CompressionUtils.unzip(CompressionUtils.java:149) ~[java-util-0.27.7.jar:?]

at io.druid.storage.s3.S3DataSegmentPuller.getSegmentFiles(S3DataSegmentPuller.java:210) ~[?:?]

at io.druid.storage.s3.S3LoadSpec.loadSegment(S3LoadSpec.java:62) ~[?:?]

at

Okay. It was due to the tmp directory not alteady created. I was assuming that it would be automatically created on need.
Working fine now.

Thank you.