Batch Ingestion Failed

Dear all,

I’m doing a Batch Ingestion and it is failed with following exception:

java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
	at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:222) ~[druid-indexing-service-0.12.3.jar:0.12.3]
	at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:238) ~[druid-indexing-service-0.12.3.jar:0.12.3]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:444) [druid-indexing-service-0.12.3.jar:0.12.3]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:416) [druid-indexing-service-0.12.3.jar:0.12.3]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_151]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_151]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_151]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_151]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_151]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:219) ~[druid-indexing-service-0.12.3.jar:0.12.3]
	... 7 more
Caused by: io.druid.java.util.common.ISE: Job[class io.druid.indexer.IndexGeneratorJob] failed!
	at io.druid.indexer.JobHelper.runJobs(JobHelper.java:391) ~[druid-indexing-hadoop-0.12.3.jar:0.12.3]
	at io.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:95) ~[druid-indexing-hadoop-0.12.3.jar:0.12.3]
	at io.druid.indexing.common.task.HadoopIndexTask$HadoopIndexGeneratorInnerProcessing.runTask(HadoopIndexTask.java:293) ~[druid-indexing-service-0.12.3.jar:0.12.3]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_151]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_151]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_151]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_151]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:219) ~[druid-indexing-service-0.12.3.jar:0.12.3]
	... 7 more



I view the log file in YARN, i see this exception:

org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class io.druid.indexer.IndexGeneratorJob$IndexGeneratorOutputFormat not found
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:473)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:374)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1455)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1452)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1385)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class io.druid.indexer.IndexGeneratorJob$IndexGeneratorOutputFormat not found
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895)
	at org.apache.hadoop.mapreduce.task.JobContextImpl.getOutputFormatClass(JobContextImpl.java:222)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:469)
	... 8 more
Caused by: java.lang.ClassNotFoundException: Class io.druid.indexer.IndexGeneratorJob$IndexGeneratorOutputFormat not found
	at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893)
	... 10 more


I know that IndexGeneratorJob is in druid-indexing-hadoop-0.12.3.jar, and i put druid-indexing-hadoop-0.12.3.jar under hadoop-dependencies and extensions/druid-hdfs-storage/ but it doesn't help.

This issue take me some days, please help me out. thanks

Regards,
Kien




It seems like maybe your classpath is in a strange state, or cluster is misconfigured, I would recommend trying out the Druid quickstart and hadoop tutorial, for an example of a working Druid+Hadoop deployment:

http://druid.io/docs/latest/tutorials/index.html

http://druid.io/docs/latest/tutorials/tutorial-batch-hadoop.html

Thanks,

Jon

Thanks Jon very much for your response, I do appreciate your time.
I ran it successfully in local, but when I ran on production, I got this issue.

The different is that in local, the yarn-site.xml we can define yarn.application.classpath, but in production the middlewaremanager is not allocated with hadoop, so if we define like this the druid job is failed.

yarn.application.classpath

/usr/local/hadoop/etc/hadoop, /usr/local/hadoop/share/hadoop/common/, /usr/local/hadoop/share/hadoop/common/lib/, /usr/local/hadoop/share/hadoop/hdfs/, /usr/local/hadoop/share/hadoop/hdfs/lib/, /usr/local/hadoop/share/hadoop/mapreduce/, /usr/local/hadoop/share/hadoop/mapreduce/lib/, /usr/local/hadoop/share/hadoop/yarn/, /usr/local/hadoop/share/hadoop/yarn/lib/

I see that Druid ingestion submit two jobs to Yarn the first one is determine-partitions and this job run successfully but the second job index-generator is failed.

Thanks,

Kien

Hm, what version of Hadoop are you using?

It seems like there is a Hadoop bug that can cause the issue you’re seeing:

https://groups.google.com/forum/#!topic/druid-user/5mN2bnEGgsY

https://issues.apache.org/jira/browse/MAPREDUCE-5957

I know that IndexGeneratorJob is in druid-indexing-hadoop-0.12.3.jar, and i put druid-indexing-hadoop-0.12.3.jar under hadoop-dependencies and extensions/druid-hdfs-storage/ but it doesn’t help.

This step isn’t needed, Druid would normally copy the druid-indexing-hadoop jar to Hadoop, it doesn’t need to be in hadoop-dependencies or the HDFS storage extension.

Thanks,

Jon

Dear Jon,
Sorry for my late response.

I’m using hadoop 2.4.1, but upgrade our production is so risky :frowning:

Thanks,

Kien