Hello,
Any help with this issue would be appreciated and I’m also hoping there is a chance to retry/correct the issue. We encountered an error in the indexing logs that as of now seems to have lost an entire day’s worth of data for us…hopefully it can be recovered/retried somehow.
We have an S3 backed deep storage Druid instance that realtime data is sent to via Tranquility. Just in case we checked if there were any files on disk on the Tranquility server in the Java temporary directory (other than empty folders with names like 1487276182596-0)…not that we expected files to be persisted there in this setup.
In the indexing log for the interval, an entire day, we see the error message below. We’re getting the error message that there is no space left on the device, but that seems to be not true as there is plenty of memory and diskspace on the boxes. Also we cannot seem to find the location of any temporary or interim commits.
Is there any possible way to retry?
This is the full/only data written to the indexing log for the interval in S3 and we also see that no segment was written for the interval to S3.
2017-02-24T00:16:34,003 ERROR [abc-2017-02-23T00:00:00.000Z-persist-n-merge] io.druid.segment.realtime.plumber.RealtimePlumber - Failed to persist merged index[abc]: {class=io.druid.segment.realtime.plumber.RealtimePlumber, exceptionType=class java.io.IOException, exceptionMessage=No space left on device, interval=2017-02-23T00:00:00.000Z/2017-02-24T00:00:00.000Z}
java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method) ~[?:1.8.0_74]
at java.io.FileOutputStream.write(FileOutputStream.java:326) ~[?:1.8.0_74]
at com.google.common.io.ByteStreams.copy(ByteStreams.java:179) ~[guava-16.0.1.jar:?]
at com.google.common.io.ByteSource.copyTo(ByteSource.java:255) ~[guava-16.0.1.jar:?]
at com.google.common.io.ByteStreams.copy(ByteStreams.java:119) ~[guava-16.0.1.jar:?]
at io.druid.segment.IndexMerger.makeIndexFiles(IndexMerger.java:873) ~[druid-processing-0.9.1.1.jar:0.9.1.1]
at io.druid.segment.IndexMerger.merge(IndexMerger.java:423) ~[druid-processing-0.9.1.1.jar:0.9.1.1]
at io.druid.segment.IndexMerger.mergeQueryableIndex(IndexMerger.java:244) ~[druid-processing-0.9.1.1.jar:0.9.1.1]
at io.druid.segment.IndexMerger.mergeQueryableIndex(IndexMerger.java:217) ~[druid-processing-0.9.1.1.jar:0.9.1.1]
at io.druid.segment.realtime.plumber.RealtimePlumber$4.doRun(RealtimePlumber.java:548) [druid-server-0.9.1.1.jar:0.9.1.1]
at io.druid.common.guava.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:42) [druid-common-0.9.1.1.jar:0.9.1.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_74]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_74]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_74]
2017-02-24T00:16:34,048 ERROR [task-runner-0-priority-0] io.druid.indexing.common.task.RealtimeIndexTask - Failed to finish realtime task: {class=io.druid.indexing.common.task.RealtimeIndexTask, exceptionType=class com.metamx.common.ISE, exceptionMessage=Exception occurred during persist and merge.}
com.metamx.common.ISE: Exception occurred during persist and merge.
at io.druid.segment.realtime.plumber.RealtimePlumber.finishJob(RealtimePlumber.java:671) ~[druid-server-0.9.1.1.jar:0.9.1.1]
at io.druid.indexing.common.task.RealtimeIndexTask.run(RealtimeIndexTask.java:405) [druid-indexing-service-0.9.1.1.jar:0.9.1.1]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid-indexing-service-0.9.1.1.jar:0.9.1.1]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid-indexing-service-0.9.1.1.jar:0.9.1.1]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_74]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_74]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_74]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_74]
2017-02-24T00:16:34,049 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[RealtimeIndexTask{id=index_realtime_abc_2017-02-23T00:00:00.000Z_0_0, type=index_realtime, dataSource=abc}]
com.metamx.common.ISE: Exception occurred during persist and merge.
at io.druid.segment.realtime.plumber.RealtimePlumber.finishJob(RealtimePlumber.java:671) ~[druid-server-0.9.1.1.jar:0.9.1.1]
at io.druid.indexing.common.task.RealtimeIndexTask.run(RealtimeIndexTask.java:405) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid-indexing-service-0.9.1.1.jar:0.9.1.1]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid-indexing-service-0.9.1.1.jar:0.9.1.1]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_74]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_74]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_74]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_74]