Handling JSON nested not working as expected

Hi

Just discover that druid 0.9 is able to handle now json nested object using json path to find the value you’re interrested for.

I’ve a JSON like this :

{
“eventType”: 0,
“key”: {
“sessionId”: -3361044265461577625,
“affiliateId”: 3,
“publisherId”: 4,
“contentId”: 4,
“adNetworkId”: 1
},
“keyName”: {
“affiliateName”: “aff3”,
“publisherName”: “pub4”,
“contentName”: “content4”,
“adNetworkName”: “adNet1”,
“engineName”: “engine0”
},
“version”: “iOs-SDK-1.0”,
“countryCode”: “US”,
“distributorId”: 0,
“dateCreated”: 1443495600000,
“modelName”: “G3”,
“nokiaSeries”: 0,
“deviceOs”: “Ice Cream Sandwich”,
“deviceOsVersion”: “4.1”,
“brandName”: “Samsung”,
“supportsFullScreen”: 1,
“requestType”: 0,
“locationType”: 1,
“values”: {
“getImpressionsRatio”: 0.5,
“soldRatio”: 0.5,
“publisherGross”: 1.0,
“publisherNet”: 0.5305187914050087,
“affiliateGross”: 1.0,
“affiliateNet”: 1.0,
“iaGross”: 1.0,
“iaNet”: 1.0,
“occurence”: 2
},
“day”: “2015-09-29”,
“hour”: “06”,
“table”: “aggregation”
}

``

As documented I try a schema discover file for Tranquility to extract and handle this JSON over kafka streaming.

My schema is like this

{

“dataSources”: {

“aggregation-event”: {

“spec”: {

“dataSchema”: {

“dataSource”: “aggregation-event”,

“parser”: {

“type”: “string”,

“flattenSpec”: {

“useFieldDiscovery”: false,

“fields”: [

{

“type”: “root”,

“name”: “eventType”,

“expr”: “eventType”

},

{

“type”: “nested”,

“name”: “publisherId”,

“expr”: “$.key.publisherId”

},

{

“type”: “nested”,

“name”: “contentId”,

“expr”: “$.key.contentId”

},

{

“type”: “nested”,

“name”: “adNetworkId”,

“expr”: “$.key.adNetworkId”

},

{

“type”: “root”,

“name”: “countryCode”,

“expr”: “countryCode”

},

{

“type”: “nested”,

“name”: “publisherNet”,

“expr”: “$.values.publisherNet”

},

{

“type”: “nested”,

“name”: “iaGross”,

“expr”: “$.values.iaGross”

},

{

“type”: “nested”,

“name”: “iaNet”,

“expr”: “$.values.iaNet”

},

{

“type”: “nested”,

“name”: “occurence”,

“expr”: “$.values.occurence”

}

]

},

“parseSpec”: {

“format”: “json”,

“timestampSpec”: {

“column”: “dateCreated”,

“format”: “auto”

},

“dimensionsSpec”: {

“dimensions”: [“eventType”, “publisherId”, “contentId”, “adNetworkId”, “countryCode”],

“dimensionExclusions”:

}

}

},

“granularitySpec”: {

“type”: “uniform”,

“segmentGranularity”: “hour”,

“queryGranularity”: “none”

},

“metricsSpec”: [

{

“type”: “count”,

“name”: “count”

},

{

“name”: “publisherNet”,

“type”: “doubleSum”,

“fieldName”: “publisherNet”

},

{

“name”: “iaNet”,

“type”: “doubleSum”,

“fieldName”: “iaNet”

},

{

“name”: “iaGross”,

“type”: “doubleSum”,

“fieldName”: “iaGross”

},

{

“name”: “occurence”,

“type”: “longSum”,

“fieldName”: “occurence”

}

]

},

“ioConfig”: {

“type”: “realtime”

},

“tuningConfig”: {

“type”: “realtime”,

“maxRowsInMemory”: “100000”,

“intermediatePersistPeriod”: “PT10M”,

“windowPeriod”: “PT10M”

}

},

“properties”: {

“task.partitions”: “1”,

“task.replicants”: “1”,

“topicPattern”: “carpetTopic”

}

}

},

“properties”: {

“zookeeper.connect”: “127.0.0.1:2181”,

“druid.discovery.curator.path”: “/druid/discovery”,

“druid.selectors.indexing.serviceName”: “druid/overlord”,

“commit.periodMillis”: “15000”,

“consumer.numThreads”: “2”,

“kafka.zookeeper.connect”: “127.0.0.1:2181”,

“kafka.group.id”: “tranquility-kafka”

}

}

``

All seems ok the events are consumed by tranquility and sent to druid but the value extraction for all nested field are just null.

All the events are consumed, the count of row is working but the all nested value are null.

I’ve checked the json path expression and seems to work as expected on the JSON.

There is not a lot of people using nested json so it’s very difficult to find help.

Thanks for any help

Hi Richard,

Can you try moving the “flattenSpec” block into the “parseSpec”?

  • Jon

Hi

Here is the config file now

{
“dataSources”: {
“aggregation-event”: {
“spec”: {
“dataSchema”: {
“dataSource”: “aggregation-event”,
“parser”: {
“type”: “string”,
“parseSpec”: {
“format”: “json”,
“flattenSpec”: {
“useFieldDiscovery”: false,
“fields”: [{
“type”: “root”,
“name”: “dateCreated”,
“expr”: “dateCreated”
},{
“type”: “root”,
“name”: “eventType”,
“expr”: “eventType”
}, {
“type”: “nested”,
“name”: “publisherId”,
“expr”: “.key.publisherId" }, { "type": "nested", "name": "contentId", "expr": ".key.contentId”
}, {
“type”: “nested”,
“name”: “adNetworkId”,
“expr”: “.key.adNetworkId" }, { "type": "root", "name": "countryCode", "expr": "countryCode" }, { "type": "nested", "name": "publisherNet", "expr": ".values.publisherNet”
}, {
“type”: “nested”,
“name”: “iaGross”,
“expr”: “.values.iaGross" }, { "type": "nested", "name": "iaNet", "expr": ".values.iaNet”
}, {
“type”: “nested”,
“name”: “occurence”,
“expr”: “$.values.occurence”
}]
},
“timestampSpec”: {
“column”: “dateCreated”,
“format”: “auto”
},
“dimensionsSpec”: {
“dimensions”: [“eventType”, “publisherId”, “contentId”, “adNetworkId”, “countryCode”],
“dimensionExclusions”:
}
}
},
“granularitySpec”: {
“type”: “uniform”,
“segmentGranularity”: “hour”,
“queryGranularity”: “none”
},
“metricsSpec”: [{
“type”: “count”,
“name”: “count”
}, {
“name”: “pubNet”,
“type”: “doubleSum”,
“fieldName”: “publisherNet”
}, {
“name”: “iaNetSum”,
“type”: “doubleSum”,
“fieldName”: “iaNet”
}, {
“name”: “iaGrossSum”,
“type”: “doubleSum”,
“fieldName”: “iaGross”
}, {
“name”: “occurenceEvent”,
“type”: “longSum”,
“fieldName”: “occurence”
}]
},
“ioConfig”: {
“type”: “realtime”
},
“tuningConfig”: {
“type”: “realtime”,
“maxRowsInMemory”: “100000”,
“intermediatePersistPeriod”: “PT200M”,
“windowPeriod”: “PT200M”
}
},
“properties”: {
“task.partitions”: “1”,
“task.replicants”: “1”,
“topicPattern”: “carpetTopic”
}
}
},
“properties”: {
“zookeeper.connect”: “127.0.0.1:2181”,
“druid.discovery.curator.path”: “/druid/discovery”,
“druid.selectors.indexing.serviceName”: “druid/overlord”,
“commit.periodMillis”: “15000”,
“consumer.numThreads”: “2”,
“kafka.zookeeper.connect”: “127.0.0.1:2181”,
“kafka.group.id”: “tranquility-kafka”
}
}

``

The flattenSpec is now inside the parseSpec but still no value extracted from JSON all the value from nested field are null

Could you help me to understand how to debug the tranquility server extraction or change the log level to understand why is not able to

extract the value using jsonpath expression ?

Thanks

Hey Richard,

This feature is still WIP in tranquillity, please watch this issue for updates: https://github.com/druid-io/tranquility/issues/113

I expect this to be included in the next release.