Can not deserialize instance of java.util.ArrayList out of VALUE_STRING when loading from kafka

Hey Guys,

I am new to Druid, and trying to load data from an existing kafka stream to druid.

I am following the quickstart documentation mentioned in the link: http://druid.io/docs/latest/tutorials/tutorial-kafka.html

This is the json config file used for tranquility : http://pastebin.com/2fmFFS9b ( and this is the sample data present on the kafka stream: http://pastebin.com/nnwX9HWN)

Unfortunately this tranquility crashes quite quickly with the Can not deserialize instance of java.util.ArrayList out of VALUE_STRING error: http://pastebin.com/ZD82XWjv

I am using http://jsonlint.com/ to validate the above json file. and it shows no errors.

could anyone please let me know if I am missing something ?

Thanks

Suhas

In case you are unable to access the above links:

Sample data:

{“Eventtype_id”:103,“Eventtype_name”:“blah”,“Event_timestamp”:1478598133033,“Device”:{“Device_name”:“blah-blu-blah”,“Device_ipaddr”:“1.1.1.1”},“Properties”:{“If_name”:“e37”,“State_timestamp”:1478595804597,“txb”:0,“txp”:0,“txd”:0,“rxb”:0,“rxp”:0,“rxd”:0,“epd”:0,“txc”:0,“rxc”:0,“rxm”:0,“delta”:2318.224062609}}

tranquility kafka.json:

{

“dataSources”: {

“linkstate-kafka”: {

“spec”: {

“dataSchema”: {

“dataSource”: “linkstate-kafka”,

“parser”: {

“type”: “string”,

“parseSpec”: {

“format”: “json”,

“flattenSpec”: {

“useFieldDiscovery”: true,

“fields”: [{

“type”: “path”,

“name”: “Device_name”,

“expr”: “$.Device.Device_name”

}, {

“type”: “path”,

“name”: “Device_ipaddr”,

“expr”: “$.Device.Device_ipaddr”

}, {

“type”: “path”,

“name”: “If_name”,

“expr”: “$.Properties.If_name”

}, {

“type”: “path”,

“name”: “State_timestamp”,

“expr”: “$.Properties.State_timestamp”

}, {

“type”: “path”,

“name”: “txb”,

“expr”: “$.Properties.txb”

}, {

“type”: “path”,

“name”: “txp”,

“expr”: “$.Properties.txp”

}, {

“type”: “path”,

“name”: “txd”,

“expr”: “$.Properties.txd”

}, {

“type”: “path”,

“name”: “rxb”,

“expr”: “$.Properties.rxb”

}, {

“type”: “path”,

“name”: “rxp”,

“expr”: “$.Properties.rxp”

}, {

“type”: “path”,

“name”: “rxd”,

“expr”: “$.Properties.rxd”

}, {

“type”: “path”,

“name”: “epd”,

“expr”: “$.Properties.epd”

}, {

“type”: “path”,

“name”: “txc”,

“expr”: “$.Properties.txc”

}, {

“type”: “path”,

“name”: “rxc”,

“expr”: “$.Properties.rxc”

}, {

“type”: “path”,

“name”: “rxm”,

“expr”: “$.Properties.rxm”

}, {

“type”: “path”,

“name”: “delta”,

“expr”: “$.Properties.delta”

}]

},

“dimensionsSpec”: {

“dimensions”: ,

“dimensionExclusions”: [

“Event_timestamp”

]

},

“timestampSpec”: {

“column”: “State_timestamp”,

“format”: “auto”

}

}

},

“granularitySpec”: {

“type”: “uniform”,

“segmentGranularity”: “hour”,

“queryGranularity”: “none”,

“intervals”: “no”

},

“metricsSpec”: [{

“type”: “count”,

“name”: “count”

}, {

“name”: “txb_sum”,

“type”: “doubleSum”,

“fieldName”: “txb”

}, {

“name”: “txp_sum”,

“type”: “doubleSum”,

“fieldName”: “txp”

}, {

“name”: “txd_sum”,

“type”: “doubleSum”,

“fieldName”: “txd”

}, {

“name”: “rxb_sum”,

“type”: “doubleSum”,

“fieldName”: “rxb”

}, {

“name”: “rxp_sum”,

“type”: “doubleSum”,

“fieldName”: “rxp”

}, {

“name”: “rxd_sum”,

“type”: “doubleSum”,

“fieldName”: “rxd”

}, {

“name”: “epd_sum”,

“type”: “doubleSum”,

“fieldName”: “epd”

}, {

“name”: “txc_sum”,

“type”: “doubleSum”,

“fieldName”: “txc”

}, {

“name”: “rxc_sum”,

“type”: “doubleSum”,

“fieldName”: “rxc”

}, {

“name”: “rxm_sum”,

“type”: “doubleSum”,

“fieldName”: “rxm”

}, {

“name”: “delta_sum”,

“type”: “doubleSum”,

“fieldName”: “delta”

}]

},

“ioConfig”: {

“type”: “realtime”

},

“tuningConfig”: {

“type”: “realtime”,

“maxRowsInMemory”: “1000000”,

“intermediatePersistPeriod”: “PT10M”,

“windowPeriod”: “PT10M”

}

},

“properties”: {

“task.partitions”: “200”,

“task.replicants”: “3”,

“topicPattern”: “Analytics.SwitchEvent.IfStat”

}

}

},

“properties”: {

“zookeeper.connect”: “10.12.3.40:2181,10.12.3.41:2181,10.12.3.42:2181,10.12.3.43:2181,10.12.3.44:2181”,

“druid.discovery.curator.path”: “/druid/discovery”,

“druid.selectors.indexing.serviceName”: “druid/overlord”,

“commit.periodMillis”: “15000”,

“consumer.numThreads”: “2”,

“kafka.zookeeper.connect”: “10.12.3.40:2181,10.12.3.41:2181,10.12.3.42:2181,10.12.3.43:2181,10.12.3.44:2181”,

“kafka.group.id”: “tranquility-kafka”

}

}

Error:

2016-11-08 20:11:05,384 [KafkaConsumer-1-EventThread] INFO o.a.c.f.state.ConnectionStateManager - State change: CONNECTED

2016-11-08 20:11:05,675 [KafkaConsumer-1] INFO c.m.t.finagle.FinagleRegistry - Adding resolver for scheme[disco].

2016-11-08 20:11:08,188 [KafkaConsumer-1] INFO o.h.validator.internal.util.Version - HV000001: Hibernate Validator 5.1.3.Final

2016-11-08 20:11:08,716 [KafkaConsumer-1] INFO io.druid.guice.JsonConfigurator - Loaded class[class io.druid.guice.ExtensionsConfig] from props[druid.extensions.] as [ExtensionsConfig{searchCurrentClassloader=true, directory=‘extensions’, hadoopDependenciesDir=‘hadoop-dependencies’, loadList=null}]

2016-11-08 20:11:09,112 [KafkaConsumer-1] ERROR c.m.tranquility.kafka.KafkaConsumer - Exception:

java.lang.IllegalArgumentException: Can not deserialize instance of java.util.ArrayList out of VALUE_STRING token

at [Source: N/A; line: -1, column: -1]

Hi Suhas,

Can you try removing “intervals” from the following block?

“granularitySpec”: {

“type”: “uniform”,

“segmentGranularity”: “hour”,

“queryGranularity”: “none”,

“intervals”: “no”

},

Thanks,

Jon

Hi Suhas,

One other thing is that if you are loading data from Kafka to Druid, I think you’ll probably find the new exactly once Kafka indexing task very helpful:

https://imply.io/docs/latest/tutorial-kafka-indexing-service.html

Hi Jon,

That fixed it. Thanks!

-Suhas

Thanks FJ. Yeah, this is more useful, and exactly what I was looking for :slight_smile:

-Suhas