Failed hadoop indexing task

Hi,

I’m trying to run a batch ingestion task and I keep on failing due to this error:

2016-06-06T06:35:08,521 INFO [Thread-56] org.apache.hadoop.mapred.LocalJobRunner - reduce task executor complete.

2016-06-06T06:35:09,085 WARN [Thread-56] org.apache.hadoop.mapred.LocalJobRunner - job_local1392479455_0001

java.lang.Exception: java.io.IOException: No such file or directory

at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) ~[hadoop-mapreduce-client-common-2.3.0.jar:?]

at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) [hadoop-mapreduce-client-common-2.3.0.jar:?]

Caused by: java.io.IOException: No such file or directory

at java.io.UnixFileSystem.createFileExclusively(Native Method) ~[?:1.8.0_91]

at java.io.File.createTempFile(File.java:2024) ~[?:1.8.0_91]

at java.io.File.createTempFile(File.java:2070) ~[?:1.8.0_91]

at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:558) ~[druid-indexing-hadoop-0.9.0.jar:0.9.0]

at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:469) ~[druid-indexing-hadoop-0.9.0.jar:0.9.0]

at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) ~[hadoop-mapreduce-client-core-2.3.0.jar:?]

at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) ~[hadoop-mapreduce-client-core-2.3.0.jar:?]

at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) ~[hadoop-mapreduce-client-core-2.3.0.jar:?]

at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) ~[hadoop-mapreduce-client-common-2.3.0.jar:?]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_91]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_91]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_91]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_91]

at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_91]

2016-06-06T06:35:09,650 INFO [communication thread] org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce

2016-06-06T06:35:09,807 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - map 100% reduce 99%

2016-06-06T06:35:09,808 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Job job_local1392479455_0001 failed with state FAILED due to: NA

2016-06-06T06:35:09,912 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Counters: 38

``

What is causing this?

Thanks

In the task log I get

  druid.indexer.task.hadoopWorkingPath: var/druid/hadoop-tmp

``

but nothing is written in this directory. The map reduce temp files are written in

/tmp/hadoop-root/mapred/

``

I just moved from 0.9.0 to 0.9.1-rc1 and now i’m getting:

2016-06-06T18:11:58,758 INFO [Thread-43] org.apache.hadoop.mapred.LocalJobRunner - reduce task executor complete.

2016-06-06T18:11:58,759 WARN [Thread-43] org.apache.hadoop.mapred.LocalJobRunner - job_local149987833_0002

java.lang.Exception: java.io.IOException: No such file or directory

at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) ~[hadoop-mapreduce-client-common-2.3.0.jar:?]

at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) [hadoop-mapreduce-client-common-2.3.0.jar:?]

Caused by: java.io.IOException: No such file or directory

at java.io.UnixFileSystem.createFileExclusively(Native Method) ~[?:1.7.0_101]

This is the same schema and the datafile hasn’t moved.

Are you using 0.9.1-rc1?

I’m running duid in local, “quickstart” mode. I forgot to ‘bin/init’ before starting all the processes. that seems to have fixed my problem.

Did you every find out what was causing this?
i’m running into this now in s3 when druid is half way thru ingesting a gz file.

Not sure what file it’s not finding.

Seems like you might be running into the same issue here, where the tmp directory needs to exist beforehand:
https://groups.google.com/forum/#!topic/druid-user/YwKDLrr1ZOI/discussion

Also looks like your hadoopWorkingPath is a relative path, could that be causing problems?

  druid.indexer.task.hadoopWorkingPath: var/druid/hadoop-tmp

``

Thanks,

Jon