Spark + Druid Tranquility - library version conflict

I’ve posted an issue I’m facing with using Spark and Tranquility to: http://stackoverflow.com/questions/34431329/spark-druid-tranquility-library-version-conflict

I’m writing here, just to get the attention of Druid Users, don’t want to duplicate the post here.

I wonder if Tranquility-Spark works with standard spark build.

Any pointers to resolve the conflict?

Hey Ashish,

I’m guessing this is due to mixing different versions of jackson-databind with jackson-datatype-joda (possibly 2.6.1 of joda with an older databind). Would it work for you to include the same version of both jackson jars on Spark’s classpath? I think you shouldn’t have to recompile Spark- just get it to load the newer jacksons.

If not then we can possibly bundle tranquility-spark specifically with an older version of jackson.

Either way, if you could update this thread or https://github.com/druid-io/tranquility/issues/76 with whether you do end up finding a workaround, that would be super helpful.

Thanks!

Thanks Gian for your response.

I have both versions of jackson jars in classpath and spark is picking up the old one, which is bundled with spark and not the newer, which is bundled with my application jar.

I have received an answer on how to force the newer one at http://stackoverflow.com/questions/34431329/spark-druid-tranquility-library-version-conflict

I’m going to try that when I’m back from vacations next week. I will update the post once I’ve tried that.

Thanks again

Ashish

Upgrading jackson tends to break things from the druid-hadoop side, which is why we haven’t done it in the project. I’d like to know how spark manages to have an updated jackson version without breaking hadoop.

I had very similar issues with https://github.com/metamx/druid-spark-batch

Hi, All,

I had the same problem when submitting spark job with tranquility to CDH 5.4, but the older version of databind.jar is from CDH: /opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hadoop-mapreduce/jackson-databind-2.2.3.jar

The options listed in the stackoverflow link below didn’t help in my case. I plan to build a metamx scala-utils.jar with the older version of databind.jar (obviously need to modify some code, hopefully not too much)…does anyone has any other suggestions on this?

Thanks,

Charles Chao

Hey Charles,

Jackson is usually pretty good about not breaking backwards compatibility, so if you can figure out how to get the newer versions loaded on CDH that would probably work best. Otherwise, yeah, probably try building tranquility with an older Jackson and see if it works. If you do find a solution I would really appreciate if you took the time to post it here.

IIRC CDH has an option that does something like user-classpath-goes-first; maybe that’d help?

Hi, Gian,

Thanks for your suggestions. Actually I tried the spark experimental “user-classpath-first” before, but it didn’t help. As of the CDH, since it’s our prod env with many jobs running, I prefer not to change its configuration yet.

I tried building the metamx scala-util with an older version of Jackson datatype jar, and I’m getting a new error. I have pasted the error message below…does anyone has some insights about this?

Thanks,

Charles Chao

Caused by: com.google.inject.CreationException: Guice creation errors:

1) An exception was caught and reported. Message: Unable to create a Configuration, because no Bean Validation provider could be found. Add a provider like Hibernate Validator (RI) to your classpath.
  at com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)

2) No implementation for javax.validation.Validator was bound.
  at io.druid.guice.ConfigModule.configure(ConfigModule.java:37)

2 errors
	at com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
	at com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
	at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
	at com.google.inject.Guice.createInjector(Guice.java:95)
	at com.google.inject.Guice.createInjector(Guice.java:72)
	at io.druid.guice.GuiceInjectors.makeStartupInjector(GuiceInjectors.java:57)
	at com.metamx.tranquility.druid.DruidGuicer$.<init>(DruidGuicer.scala:39)
	at com.metamx.tranquility.druid.DruidGuicer$.<clinit>(DruidGuicer.scala)
	... 17 more
Caused by: javax.validation.ValidationException: Unable to create a Configuration, because no Bean Validation provider could be found. Add a provider like Hibernate Validator (RI) to your classpath.
	at javax.validation.Validation$GenericBootstrapImpl.configure(Validation.java:271)
	at javax.validation.Validation.buildDefaultValidatorFactory(Validation.java:110)
	at io.druid.guice.ConfigModule.configure(ConfigModule.java:37)
	at com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
	at com.google.inject.spi.Elements.getElements(Elements.java:101)
	at com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
	at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)

This error happened on driver, before it can even start any tasks. Switching back to datatype 2.6.1, this problem goes away, tasks can be started, and then failed due to that original “NoSuchFieldError” on each executors. So apparently there’s other dependencies on version 2.6.1, I cannot simply use a lower version.

At this time, I have run out of options. Maybe the only thing left is to upgrade the jar file for CDH, but that’s not too realistic to me.

I’ve searched but it appears that Tranquility is the only option to write to Druid from Spark/SparkStreaming. It would be really disappointing if I have to change the streaming pipeline just because of this issue. Any suggestion is welcome.

Thanks,

Charles

You might be able to work around the conflict by deploying your spark job with a self-contained jar that uses relocated classes. This should allow both versions of jackson to exist, one used by spark internals and one used by your code. You can do that with the maven shade plugin: https://maven.apache.org/plugins/maven-shade-plugin/examples/class-relocation.html

Another option is rebuilding tranquillity to use a different version of jackson- I’m not sure which one would work, but there might be one out there that will. If anyone ever figures that out it would be super helpful to hear what works, since we could make that change in the official tranquility-spark.

I have tried rebuilding with the older version of dependency jar yesterday, also set up user classpath first (this is only tested on local mode for now), both time I got this error message…any insights on this? I’m not sure if this error message means I actually moved one step ahead.

I did try to add hibernate-validator in my dependencies but that didn’t help.

I have these druid and tranquility related dependencies in my project:

"io.druid" % "druid" % "0.7.3",
"io.druid" % "druid-processing" % "0.7.3",
"io.druid" % "tranquility-core_2.10" % "0.6.4",
"io.druid" % "tranquility-spark_2.10" % "0.6.4",
"org.hibernate" % "hibernate-validator" % "4.2.0.Final",
"org.hibernate" % "hibernate-validator-annotation-processor" % "4.1.0.Final"

Thanks,

Charles Chao

============ error messsage ===================

Hey folks,

Next version of tranquility will have a downgraded jackson to match the version used by Druid: https://github.com/druid-io/tranquility/pull/81. Hopefully that fixes these problems.

If anyone could build from master and try that out in their environment, that would be incredibly helpful. The easiest way is probably to run “sbt +publishM2” to publish to your local maven repository, or “sbt +publish-local” to publish to your local ivy repository.

Thanks Gian.

This pull request has only one change:

-val jacksonTwoVersion = “2.6.3”
+val jacksonTwoVersion = “2.4.6”

However I think scala-util still has dependency with jackson version 2.6.

Charles

I think that isn’t not using anything that is in 2.6.3 but not 2.4.6, so it should be enough to override it here. At least that’s the idea.

I had the same issue. The assembled jar that I made did not contain hibernate-validator. I tried to add it to the jar but I couldn’t.
Instead, I added --jars option for hibernate-validator.jar to spark-submit and it worked.

2016년 1월 8일 금요일 오전 9시 33분 16초 UTC+9, Charles Chao 님의 말: