Error running Druid indexing task on Hadoop 2.4 with YARN

Hi everyone,
I have a problem with running Druid hadoop indexing task. I have a hadoop cluster version 2.4, with YARN and I run Druid 0.7.3 on it. Currently, I’m running realtime loading on it successful, but now I would like to index data from Hadoop.

I recompiled Druid from source, with all hadoop versions changed to 2.4.0 (in pom’s, in DEFAULT_HADOOP_COORDINATES constant).

I run overlord, it starts fine.

Then I start index task by curl -X ‘POST’ -H ‘Content-Type:application/json’ -d @etc/druid/indexer/index-hadoop-task.spec localhost:19083/druid/indexer/v1/task

Then, I get en error in a map task logs:

2015-06-21 14:57:14,792 INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output
2015-06-21 14:57:14,813 ERROR [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoSuchMethodError: com.fasterxml.jackson.core.JsonFactory.requiresPropertyOrdering()Z
	at com.fasterxml.jackson.databind.ObjectMapper.<init>(ObjectMapper.java:457)
	at com.fasterxml.jackson.databind.ObjectMapper.<init>(ObjectMapper.java:389)
	at io.druid.jackson.DefaultObjectMapper.<init>(DefaultObjectMapper.java:43)
	at io.druid.jackson.DefaultObjectMapper.<init>(DefaultObjectMapper.java:33)
	at io.druid.jackson.JacksonModule.jsonMapper(JacksonModule.java:44)
...



I see in logs of the index task (peon I think), that it uploads jackson in version 1.8.8 to hdfs :

2015-06-21T13:04:45,752 INFO [task-runner-0] io.druid.indexer.JobHelper - Uploading jar to path[/tmp/druid-indexing/classpath/jackson-core-asl-1.8.8.jar]
2015-06-21T13:04:45,790 INFO [task-runner-0] io.druid.indexer.JobHelper - Uploading jar to path[/tmp/druid-indexing/classpath/jackson-mapper-asl-1.8.8.jar]

I believe this is the problem, because JsonFactory in 1.8.8 does not have requiresPropertyOrdering method. I suspect that YARN on Hadoop 2.4 uses jackson.databind in version 1.8.8. I intentionally don’t add jackson to the classpath of overlord.

How do you guys handle this conflict? I would grateful for any help with this.

Ping, anyone can help?

Currently, I worked around this issue by 1) compiling hadoop-client & hdfs-storage plugin into Druid jar 2) shading the conflicting jars with maven shade plugin. This is definitely against using Druid plugins, but I couldn’t overcome this problem any other way…

W dniu niedziela, 21 czerwca 2015 15:35:25 UTC+2 użytkownik Krzysztof Zarzycki napisał:

Are you running CDH by any chance or is this stock hadoop?

I believe you are hitting the same class of problems as described here:

https://groups.google.com/forum/#!msg/druid-development/jNxhMZpp-rc/XwAFP2xYe60J

and that there are dependency conflicts between your version of Druid and Hadoop. The easiest workaround is to recompile Druid with the same version of Jackson as in your Hadoop distribution.

In general, unfortunately hadoop classpath gets mixed up application classpath in the MR jobs. You have no option but to work within the constraints of what dependency versions hadoop uses. That means, changing those dependency versions in druid, rebuilding and using those on your particular hadoop version.

– Himanshu