Error: class com.fasterxml.jackson.datatype.guava.deser.HostAndPortDeserializer

Hi Druid Gurus,

we are in midst of upgrading our druid version to 0.9.1.1 and have been facing issues with index task. we are currently on Hadoop 2.6.0 version and the issue we are facing is happening at map-reduce stage. below is the error.

ERROR [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.VerifyError: class
com.fasterxml.jackson.datatype.guava.deser.HostAndPortDeserializer overrides final method deserialize.
(Lcom/fasterxml/jackson/core/JsonParser;Lcom/fasterxml/jackson/databind/DeserializationContext;)Ljava/lang/Object;
	at java.lang.ClassLoader.defineClass1(Native Method)

We had similar issue with version 0.8.0 and we resolved it by replacing jackson-*-2.4.4.jar files with jackson-*-.2.2.3.jar.
Should we take similar approach?

below is my command to start overlord and also my index spec

#run_overlord
export HADOOP_CONF_DIR=/x/home/pp_paz_pci_admin/druid-0.9.1.1/config
cp null overlord.log
export CLASSPATH=$CLASSPATH:$HADOOP_HOME:$HADOOP_CONF_DIR:/x/hdp/2.2.9.0-3393/hadoop/lib/*:/x/hdp/2.2.9.0-3393/hadoop/lib/native/*
echo $CLASSPATH
nohup java -classpath lib/*:conf/druid/overlord:conf/druid/_common:$CLASSPATH io.druid.cli.Main server overlord > overlord.log &


 pageview.json

{
	"type": "index_hadoop",
	"spec": {
		"dataSchema": {
			"dataSource": "pageviewshdp",
			"parser": {
				"type": "hadoopyString",
				"parseSpec": {
					"format": "json",
					"timestampSpec": {
						"format": "auto",
						"column": "time"
					},
					"dimensionsSpec": {
						"dimensions": [
							"url",
							"user"
						],
						"dimensionExclusions": [],
						"spatialDimensions": []
					},
					"columns": [
						"time",
						"url",
						"user",
						"latencyMs"

					]
				}
			},
			"metricsSpec": [{
				"name": "views",
				"type": "count"
			}, {
				"name": "latencyMs",
				"type": "doubleSum",
				"fieldName": "latencyMs"
			}],
			"granularitySpec": {
				"type": "uniform",
				"segmentGranularity": "DAY",
				"queryGranularity": "NONE",
				"intervals": ["2015-09-01/2015-09-02"]
			}
		},
		"ioConfig": {
			"type": "hadoop",
			"inputSpec": {
				"type": "static",
				"paths": "hdfs://zzz/apps/dt/merchant/druid_input.json"
			}
		},
		"tuningConfig": {
			"type": "hadoop"
		}
	}
}

Hi Karteek,

Can you try:

1.) Use pull-deps to grab hadoop-client:2.6.0

http://druid.io/docs/latest/operations/pull-deps.html

2.) Specify hadoop-client:2.6.0 in your hadoopDependencyCoordinates

http://druid.io/docs/latest/ingestion/batch-ingestion.html

“hadoopDependencyCoordinates”: [“org.apache.hadoop:hadoop-client:2.6.0”]

3.) Set “mapreduce.job.user.classpath.first”: “true” in the jobProperties in tuningConfig in your indexing task as described in the CDH section here:

http://druid.io/docs/latest/operations/other-hadoop.html

  • Jon

Hi Jon,

I am still getting the same error after i implemented the steps you have mentioned. Below is is my index spec file. Also in common runtime properties i added this step

druid.extensions.hadoopDependenciesDir=/x/home/druid-0.9.1.1/hadoop-dependencies

index -spec file :

{

“type”: “index_hadoop”,

“hadoopDependencyCoordinates”: [“org.apache.hadoop:hadoop-client:2.6.0”],

“spec”: {

“dataSchema”: {

“dataSource”: “pageviewshdp”,

“parser”: {

“type”: “hadoopyString”,

“parseSpec”: {

“format”: “json”,

“timestampSpec”: {

“format”: “auto”,

“column”: “time”

},

“dimensionsSpec”: {

“dimensions”: [

“url”,

“user”

],

“dimensionExclusions”: ,

“spatialDimensions”:

},

“columns”: [

“time”,

“url”,

“user”,

“latencyMs”

]

}

},

“metricsSpec”: [{

“name”: “views”,

“type”: “count”

}, {

“name”: “latencyMs”,

“type”: “doubleSum”,

“fieldName”: “latencyMs”

}],

“granularitySpec”: {

“type”: “uniform”,

“segmentGranularity”: “DAY”,

“queryGranularity”: “NONE”,

“intervals”: [“2015-09-01/2015-09-02”]

}

},

“ioConfig”: {

“type”: “hadoop”,

“inputSpec”: {

“type”: “static”,

“paths”: “hdfs://hdfs-path/druid_input.json”

}

},

“tuningConfig”: {

“type”: “hadoop”,

“jobProperties”: {

“mapreduce.job.classloader”: “true”

}

}

}

Hi,

I restarted all druid components and ran the indexing service and it throws a new error and its something like this

Error: java.lang.ClassNotFoundException: javax.validation.Validator

I do have that class available in mvnrepo which i am including in mob lib directory and providing that extension in common runtime config file as below

druid.extensions.localRepository=/x/home/druid-0.9.1.1/lib/mvnrepo

Thanks

Karteek

Hi,

Can you try:

"mapreduce.job.user.classpath.first" : "true``"

instead of:

"mapreduce.job.classloader": "true"

in your jobProperties?

In the spec you posted in the thread, it also looks like jobProperties is missing a closing brace }

Thanks,

Jon

Hi Jonathan,

Sorry for getting back so late on this issue. when i i use "mapreduce.job.user.classpath.first" : "true``" i get below error

ERROR [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster - Error starting MRAppMaster
java.lang.NoClassDefFoundError: org/mortbay/jetty/servlet/Context


Attached is entire Log from MapReduce

Regards
Karteek

MapReduceIndexLog.txt (16.6 KB)

Karteek, does http://druid.io/docs/0.9.2-rc1/operations/other-hadoop.html help at all?

Hi Fangjin,

I was able to get this resolved, apart from building fat jar for indexing, i had to replace hadoop * 2.3.0 jars in extensions/druid-hdfs-storage with 2.6.0 jars.

Also i had to to change timezone setting to ‘PST’ in overlord/runtime.properties and other jvm.config files to make my indexing task work. i wasn’t too convinced with this approach and i believe i might have missed out on something and could have very well avoided this step to convert to PST.

on 0.8.0 version, i dint have to change any setting related to Timezone.

Thanks

Karteek