Getting 'Failed to create directory within 10000 attempts' when configured with S3

Hi,

I am unable to ingest data after configuring deep storage as S3. I have a cluster of 4 nodes (broker, historical, middle manager and overlord+coordinator)

Below is the exception

2016-08-29T06:47:50,701 INFO [task-runner-0-priority-0] io.druid.segment.realtime.firehose.LocalFirehoseFactory - Found files: [/home/druid/druid-0.9.1.1/hist_data.csv]
2016-08-29T06:47:50,767 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[IndexTask{id=index_temperature_stream_2016-08-29T06:47:41.115Z, type=index, dataSource=temperature_stream}]
**java.lang.IllegalStateException: Failed to create directory within 10000 attempts (tried 1472453270713-0 to 1472453270713-9999)**
	at com.google.common.io.Files.createTempDir(Files.java:600) ~[guava-16.0.1.jar:?]
	at io.druid.segment.indexing.RealtimeTuningConfig.createNewBasePersistDirectory(RealtimeTuningConfig.java:56) ~[druid-server-0.9.1.1.jar:0.9.1.1]
	at io.druid.segment.indexing.RealtimeTuningConfig.<init>(RealtimeTuningConfig.java:118) ~[druid-server-0.9.1.1.jar:0.9.1.1]
	at io.druid.indexing.common.task.IndexTask.convertTuningConfig(IndexTask.java:146) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
	at io.druid.indexing.common.task.IndexTask.generateSegment(IndexTask.java:376) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
	at io.druid.indexing.common.task.IndexTask.run(IndexTask.java:221) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid-indexing-service-0.9.1.1.jar:0.9.1.1]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid-indexing-service-0.9.1.1.jar:0.9.1.1]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_91]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_91]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_91]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_91]
2016-08-29T06:47:50,775 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_temperature_stream_2016-08-29T06:47:41.115Z] status changed to [FAILED].
2016-08-29T06:47:50,779 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_temperature_stream_2016-08-29T06:47:41.115Z",
  "status" : "FAILED",
  "duration" : 4134
}

I have checked that java.io.tmpdir exists in S3. Could you tell me which directory is this process trying to create? Or is there something that I may have missed?

Please find the attached files.

  1. Batch Ingestion Task submitted.

  2. Common Runtime Properties

  3. Middle Manager Runtime Properties

batch_ingest.json (1.79 KB)

common.runtime.properties (3.89 KB)

runtime.properties (664 Bytes)

Hi Prerna,
Its trying to make a subdirectory on the middlemanager node under java.io.tmpdir

Try setting -Djava.io.tmpdir to a directory with read/write permission in your vm arguments.

Thanks a lot Nishant.

Could you also help me with this exception? This happens when fetching historical data segments from S3. Which directory is it complaining about?

2016-08-30T03:10:40,880 INFO [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - Completed request [LOAD: temperature_stream_2016-06-04T00:00:00.000Z_2016-06-05T00:00:00.000Z_2016-08-29T07:43:24.379Z]

2016-08-30T03:10:40,880 ERROR [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - Failed to load segment for dataSource: {class=io.druid.server.coordination.ZkCoordinator, exceptionType=class io.druid.segment.loading.SegmentLoadingException, exceptionMessage=Exception loading segment[temperature_stream_2016-06-04T00:00:00.000Z_2016-06-05T00:00:00.000Z_2016-08-29T07:43:24.379Z], segment=DataSegment{size=100384, shardSpec=HashBasedNumberedShardSpec{partitionNum=0, partitions=1, partitionDimensions=}, metrics=[count_events, temperature, min_temp, max_temp], dimensions=[timestamp, sensorId, sensorName, sensorLat, sensorLong], version=‘2016-08-29T07:43:24.379Z’, loadSpec={type=s3_zip, bucket=servian-mel-druid-storage, key=druid/segments/temperature_stream/2016-06-04T00:00:00.000Z_2016-06-05T00:00:00.000Z/2016-08-29T07:43:24.379Z/0/index.zip}, interval=2016-06-04T00:00:00.000Z/2016-06-05T00:00:00.000Z, dataSource=‘temperature_stream’, binaryVersion=‘9’}}

io.druid.segment.loading.SegmentLoadingException: Exception loading segment[temperature_stream_2016-06-04T00:00:00.000Z_2016-06-05T00:00:00.000Z_2016-08-29T07:43:24.379Z]

    at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:309) ~[druid-server-0.9.1.1.jar:0.9.1.1]

    at io.druid.server.coordination.ZkCoordinator.addSegment(ZkCoordinator.java:350) [druid-server-0.9.1.1.jar:0.9.1.1]

    at io.druid.server.coordination.SegmentChangeRequestLoad.go(SegmentChangeRequestLoad.java:44) [druid-server-0.9.1.1.jar:0.9.1.1]

    at io.druid.server.coordination.ZkCoordinator$1.childEvent(ZkCoordinator.java:152) [druid-server-0.9.1.1.jar:0.9.1.1]

    at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:522) [curator-recipes-2.10.0.jar:?]

    at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:516) [curator-recipes-2.10.0.jar:?]

    at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:93) [curator-framework-2.10.0.jar:?]

    at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [guava-16.0.1.jar:?]

    at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:85) [curator-framework-2.10.0.jar:?]

    at org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:514) [curator-recipes-2.10.0.jar:?]

    at org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35) [curator-recipes-2.10.0.jar:?]

    at org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:772) [curator-recipes-2.10.0.jar:?]

    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_91]

    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_91]

    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_91]

    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_91]

    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_91]

    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_91]

    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_91]

Caused by: io.druid.segment.loading.SegmentLoadingException: No such file or directory

    at io.druid.storage.s3.S3DataSegmentPuller.getSegmentFiles(S3DataSegmentPuller.java:238) ~[?:?]

    at io.druid.storage.s3.S3LoadSpec.loadSegment(S3LoadSpec.java:62) ~[?:?]

    at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegmentFiles(SegmentLoaderLocalCacheManager.java:143) ~[druid-server-0.9.1.1.jar:0.9.1.1]

    at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:95) ~[druid-server-0.9.1.1.jar:0.9.1.1]

    at io.druid.server.coordination.ServerManager.loadSegment(ServerManager.java:152) ~[druid-server-0.9.1.1.jar:0.9.1.1]

    at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:305) ~[druid-server-0.9.1.1.jar:0.9.1.1]

    ... 18 more

Caused by: java.io.IOException: No such file or directory

    at java.io.UnixFileSystem.createFileExclusively(Native Method) ~[?:1.8.0_91]

    at java.io.File.createTempFile(File.java:2024) ~[?:1.8.0_91]

    at java.io.File.createTempFile(File.java:2070) ~[?:1.8.0_91]

    at com.metamx.common.CompressionUtils.unzip(CompressionUtils.java:149) ~[java-util-0.27.9.jar:?]

    at io.druid.storage.s3.S3DataSegmentPuller.getSegmentFiles(S3DataSegmentPuller.java:207) ~[?:?]

    at io.druid.storage.s3.S3LoadSpec.loadSegment(S3LoadSpec.java:62) ~[?:?]

    at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegmentFiles(SegmentLoaderLocalCacheManager.java:143) ~[druid-server-0.9.1.1.jar:0.9.1.1]

    at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:95) ~[druid-server-0.9.1.1.jar:0.9.1.1]

    at io.druid.server.coordination.ServerManager.loadSegment(ServerManager.java:152) ~[druid-server-0.9.1.1.jar:0.9.1.1]

    at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:305) ~[druid-server-0.9.1.1.jar:0.9.1.1]

    ... 18 more

Hi Prerna, in your historical configuration there should be a segmentCache directory where segments get downloaded locally. Does that directory exist? Do you have proper permissions for it?

Hi Fangjin,

I was able to resolve the issue. It is not the segment-cache. It was again complaining about the temp directory not existing.

Thanks a lot :slight_smile:

Hi Prerna,

How did you fix it. I am seeing exactly the same issue from my application.

Regards,

Des

Hi Des,
IIRC, above issue was resolved by setting -Djava.io.tmpdir to a directory with read/write permission in your vm arguments.

hi

i looked at the code and the code which throws exception is …

Hi Nishant,
I’m using druid 0.12.3 in cluster mode with three node cluster, I’m getting same error of

"java.lang.IllegalStateException: Failed to create directory within 10000 attempts (tried 1541165112739-0 to 1541165112739-9999)"

I have setup “-Djava.io.tmpdir=tmp” and given 777 permissions still error present no luck to get data ingestion working …

Any help would be greatly appreciated…

Thanks,

Kiran