Batch ingestion issue

Using druid 0.9.0-rc3. I am trying to make sure my AWS s3 and RDS (postgres instance) are working properly. I have modified the _common/config accordingly. I am getting this error when I am trying to batch ingest the wikipedia example json. Any ideas on fixing the issue?

2016-03-30T01:21:14,105 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_wikiticker_2016-03-30T01:20:57.548Z, type=index_hadoop, dataSource=wikiticker}]

java.lang.RuntimeException: java.lang.reflect.InvocationTargetException

at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]

at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:160) ~[druid-indexing-service-0.9.0-rc3.jar:0.9.0-rc3]

at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:208) ~[druid-indexing-service-0.9.0-rc3.jar:0.9.0-rc3]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:338) [druid-indexing-service-0.9.0-rc3.jar:0.9.0-rc3]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:318) [druid-indexing-service-0.9.0-rc3.jar:0.9.0-rc3]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_74]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_74]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_74]

at java.lang.Thread.run(Thread.java:745) [?:1.8.0_74]

Caused by: java.lang.reflect.InvocationTargetException

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_74]

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_74]

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_74]

at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_74]

at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:157) ~[druid-indexing-service-0.9.0-rc3.jar:0.9.0-rc3]

… 7 more

Caused by: com.metamx.common.ISE: Job[class io.druid.indexer.IndexGeneratorJob] failed!

at io.druid.indexer.JobHelper.runJobs(JobHelper.java:343) ~[druid-indexing-hadoop-0.9.0-rc3.jar:0.9.0-rc3]

at io.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:94) ~[druid-indexing-hadoop-0.9.0-rc3.jar:0.9.0-rc3]

at io.druid.indexing.common.task.HadoopIndexTask$HadoopIndexGeneratorInnerProcessing.runTask(HadoopIndexTask.java:261) ~[druid-indexing-service-0.9.0-rc3.jar:0.9.0-rc3]

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_74]

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_74]

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_74]

at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_74]

at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:157) ~[druid-indexing-service-0.9.0-rc3.jar:0.9.0-rc3]

… 7 more

2016-03-30T01:21:14,113 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {

“id” : “index_hadoop_wikiticker_2016-03-30T01:20:57.548Z”,

“status” : “FAILED”,

“duration” : 12287

}

Are you using EMR for your batch ingestion?
If so, there are more configs you’ll have to override: http://imply.io/docs/latest/ingestion-batch

Otherwise, can you post the full task log?

Hi Yang,

I am trying perform batch ingestion using EMR , getting the same error. Attached my full log, removed the keys purposely.

Thanks
Anindita Dey

hadoop-druid-index-error (117 KB)

Anindita, please try the docs I linked in my previous response.

Thanks Yang, I was able to do the Batch Ingestion once I have placed both hadoop and druid in the same cluster.

whenever I am trying to connect to Hadoop remotely , hadoop container is throwing the below error.

Diagnostics:
Application application_1430204789433_38485 failed 2
times due to AM Container for appattempt_1430204789433_38485_000002
exited with exitCode: 1 due to: Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
.Failing this attempt… Failing the application.

Hi Yang,

Another question is that, when I tried the job with a normal .gz file, it worked. But , when I am trying with an encrypted .gz file, job fails .

I know you have mentioned a note in your site :

" Note that this method uses Hadoop’s builtin S3 filesystem rather than Amazon’s EMRFS, and is not compatible
with Amazon-specific features such as S3 encryption and consistent views. If you need to use those
features, you will need to make the Amazon EMR Hadoop JARs available to Druid through one of the
mechanisms described in the Using other Hadoop distributions section. "

But, do you think there is any workaround I can use.

Thanks a lot.

Anindita, did you place all your Hadoop configuration files in the cp of the server you are trying to run?

What do you mean workaround? The workarounds are described in the document linked.

Hi, all

I experienced the same error as Anindita, but the solution link doesn’t work.

Could you please share the solution here again?

Best Regards,

Kamolphan

Dear Fangjin Yang,

I’m really appreciate your contribution to Druid community.
I would like to bother you with a question.

Now I’m facing a problem in submitting the wikiticker job with Druid 0.9.2 bundled in HDP 2.6.0.3. (But there isn’t any problem with standalone version of 0.10.0 in my local PC)

However, I must use the HDP version for the compatibility to Apache Ambari and the rest of my existing cluster.

This is how I submit the job.

[centos@dev-server1 druid]$ curl -X ‘POST’ -H ‘Content-Type:application/json’ -d @quickstart/wikiticker-index.json localhost:8090/druid/indexer/v1/task

{“task”:“index_hadoop_wikiticker_2017-06-15T11:04:18.145Z”}

But it appeared to be FAILED in Coordinator Console with the following error (I tried many times with a few different setting that I thought it might solve: e.g. change UNIX timezone to be my local time or add mapred.job.classloader in jobProperties as describe in this link, but to no avail):

2017-06-15T11:04:31,361 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_wikiticker_2017-06-15T11:04:18.145Z, type=index_hadoop, dataSource=wikiticker}]
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
	at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:204) ~[druid-indexing-service-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:175) ~[druid-indexing-service-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid-indexing-service-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid-indexing-service-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_131]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_131]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:201) ~[druid-indexing-service-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	... 7 more
Caused by: java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://dev-server1.c.sertis-data-center.internal:8020/user/druid/quickstart/wikiticker-2015-09-12-sampled.json
	at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
	at io.druid.indexer.DetermineHashedPartitionsJob.run(DetermineHashedPartitionsJob.java:208) ~[druid-indexing-hadoop-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	at io.druid.indexer.JobHelper.runJobs(JobHelper.java:349) ~[druid-indexing-hadoop-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	at io.druid.indexer.HadoopDruidDetermineConfigurationJob.run(HadoopDruidDetermineConfigurationJob.java:91) ~[druid-indexing-hadoop-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	at io.druid.indexing.common.task.HadoopIndexTask$HadoopDetermineConfigInnerProcessing.runTask(HadoopIndexTask.java:291) ~[druid-indexing-service-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_131]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_131]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:201) ~[druid-indexing-service-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	... 7 more
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://dev-server1.c.sertis-data-center.internal:8020/user/druid/quickstart/wikiticker-2015-09-12-sampled.json
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323) ~[?:?]
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265) ~[?:?]
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387) ~[?:?]
	at org.apache.hadoop.mapreduce.lib.input.DelegatingInputFormat.getSplits(DelegatingInputFormat.java:115) ~[?:?]
	at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301) ~[?:?]
	at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318) ~[?:?]
	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196) ~[?:?]
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) ~[?:?]
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287) ~[?:?]
	at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_131]
	at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_131]
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) ~[?:?]
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287) ~[?:?]
	at io.druid.indexer.DetermineHashedPartitionsJob.run(DetermineHashedPartitionsJob.java:116) ~[druid-indexing-hadoop-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	at io.druid.indexer.JobHelper.runJobs(JobHelper.java:349) ~[druid-indexing-hadoop-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	at io.druid.indexer.HadoopDruidDetermineConfigurationJob.run(HadoopDruidDetermineConfigurationJob.java:91) ~[druid-indexing-hadoop-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	at io.druid.indexing.common.task.HadoopIndexTask$HadoopDetermineConfigInnerProcessing.runTask(HadoopIndexTask.java:291) ~[druid-indexing-service-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_131]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_131]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:201) ~[druid-indexing-service-0.9.2.2.6.0.3-8.jar:0.9.2.2.6.0.3-8]
	... 7 more
2017-06-15T11:04:31,375 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_hadoop_wikiticker_2017-06-15T11:04:18.145Z] status changed to [FAILED].
2017-06-15T11:04:31,378 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_hadoop_wikiticker_2017-06-15T11:04:18.145Z",
  "status" : "FAILED",
  "duration" : 6650
}

Thank you very much.

log.txt (162 KB)

Forgive me for asking something obvious, but I experienced something similar when going through the wikiticker example. Are you certain the json config was placed into HDFS in the location you have specified (e.g. /user/druid/…)? FWIW, I got all of that working with the files on the local filesystem, even though that’s not ideal because you have to distribute it to all the nodes doing the indexing if you go that route.

Your task is failing due to

Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://dev-server1.c.sertis-data-center.internal:8020/user/druid/quickstart/wikiticker-2015-09-12-sampled.json

This means you are asking druid to index a none existing file on HDFS.

To fix this issue you need to copy the file from you local mac to HDFS under /user/druid/quickstart/wikiticker-2015-09-12-sampled.json

Thanks you for your kindly reply both Slim and mstephe.

I did put file into HDFS after that and it turned out to be another error produced from security setting in my cluster which is apart from Druid.
I will deal with that problem later on since it’s not the problem with Druid.

Thanks!

Best Regards,
Kamolphan