ClassNotFoundException with Druid ORC Extension

Hello,

I am trying to load data into Druid using hdfs and ORC extension, got following error because of “Jackson” version conflicts

java.lang.VerifyError: class com.fasterxml.jackson.datatype.guava.deser.HostAndPortDeserializer overrides final method deserialize.(Lcom/fasterxml/jackson/core/JsonParser;Lcom/fasterxml/jackson/databind/DeserializationContext;)Ljava/lang/Object;

I resolved this by using “mapreduce.job.classloader”: “true” in specs as suggested at http://druid.io/docs/0.9.2/operations/other-hadoop.html

Now, MR jobs are failing because of the following reason,

Error: java.lang.RuntimeException: readObject can't find class
	at org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit.readClass(TaggedInputSplit.java:136)
	at org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit.readFields(TaggedInputSplit.java:120)
	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
	at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:372)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:754)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.ql.io.orc.OrcNewSplit not found
	at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
	at org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit.readClass(TaggedInputSplit.java:134)

I checked hive-exec-2.0.0.jar is in classpath. Don't know what is going wrong. I am using Druid 0.9.2 and Hadoop 2.7.1.

Thanks.

is the jar uploaded to the hdfs as a dependency ?

This suppose to be done by druid and you can check if it exists under the temporary working directory.

All the required jars are present in working directory, still job is failing with ClassNotFoundException. Any other config is needed?

I don’t think you need extra settings.

FYI this extension is not part of the druid core so as druid committers we don’t have too much involvement with it.

Sorry.

i have contacted the author

https://github.com/druid-io/druid/pull/3019#issuecomment-280543932

Now, i am trying to ingest CSV source but getting Jackson error in MR jobs,

 Error in custom provider, java.lang.NoSuchMethodError: com.fasterxml.jackson.core.JsonFactory.requiresPropertyOrdering()Z
  at io.druid.jackson.JacksonModule.jsonMapper(JacksonModule.java:46)
  at io.druid.jackson.JacksonModule.jsonMapper(JacksonModule.java:46)
  while locating com.fasterxml.jackson.databind.ObjectMapper annotated with interface io.druid.guice.annotations.Json
  while locating com.fasterxml.jackson.databind.ObjectMapper
    for the 1st parameter of io.druid.guice.JsonConfigurator.<init>(JsonConfigurator.java:64)
  at io.druid.guice.ConfigModule.configure(ConfigModule.java:40)
  while locating io.druid.guice.JsonConfigurator
    for the 2nd parameter of io.druid.guice.JsonConfigProvider.inject(JsonConfigProvider.java:188)
  at io.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:131)

This looks like version mismatch of Jackson. My hadoop is using 2.2.3 and Druid using 2.4.6 Jackson. But, all the new versions of Jackson is already in hdfs working directory. I am using “mapreduce.job.classloader”: “true” in my specs. Is it something related to classloader?

Resolved CSV ingestion. It was because of “druid-orc-extensions” present in druid.extensions.loadList. I removed it and MR job succeed. But i am still looking for ORC ingestion and its error resolution.

hi, I meet with the same problem recently, I’d like to know whether you have figured out how to solve this issue? Looking forward to your reply. BTW: I wanna ingest ORC data.

在 2017年2月16日星期四 UTC+8下午11:53:46,Slim Bouguerra写道: