Flattened JSON dimensions not present

Hi all,

I am trying to use the JSON flattenSpec when ingesting data through tranquility kafka (imply distribution 1.2.1). All of the root level dimensions are discovered, but none of the nested ones appear. For example using the following configuration, client_logs does not contain a request_type dimension. Also incidentally the documentation indicates that the field type should be “nested”, however the code accepts “path”.

{

“dataSources” : {

"client_logs" : {

  "spec" : {

    "dataSchema" : {

      "dataSource" : "client_logs",

      "parser" : {

        "type" : "string",

        "parseSpec" : {

          "timestampSpec" : {

            "column" : "timestamp",

            "format" : "auto"

          },

          "flattenSpec": {

            "useFieldDiscovery": true,

            "fields": [

              {

                "type" : "path",

                "name" : "request_type",

                "expr" : "$.shopkick_request_details.request_type"

              },

            ]

          },

          "dimensionsSpec" : {

            "dimensions" : []

          },

          "format" : "json"

        }

      },

      "granularitySpec" : {

        "type" : "uniform",

        "segmentGranularity" : "hour",

        "queryGranularity" : "none"

      },

      "metricsSpec" : [

        {

          "type" : "count",

          "name" : "count"

        }

      ]

    },

    "ioConfig" : {

      "type" : "realtime"

    },

    "tuningConfig" : {

      "type" : "realtime",

      "maxRowsInMemory" : "100000",

      "reportParseExceptions": true,

      "intermediatePersistPeriod" : "PT10M",

      "windowPeriod" : "PT10M"

}

  },

  "properties" : {

    "task.partitions" : "1",

    "task.replicants" : "1",

    "topicPattern" : "t1_sor_client_log"

  }

}

},

“properties” : {

"zookeeper.connect" : "myserver101",

"druid.discovery.curator.path" : "/druid/discovery",

"druid.selectors.indexing.serviceName" : "druid/overlord",

"commit.periodMillis" : "15000",

"consumer.numThreads" : "1",

"kafka.zookeeper.connect" : "myserver101",

"kafka.group.id" : "tranquility-kafka"

}

My understanding is the empty array for the dimensionSpec should pick up all dimensions, but it does not get the request_type. If I add the request_type there, then it does not pick up all of the discovered fields and the request_type is always null. The events look like:

{

“client_app_version”: “shopkick/4.8.8”,

“server_ip”: “127.0.0.1”,

“user_id”: 1,

“server_name”: “myserver004”,

“remote_ip_address”: “127.0.0.1”,

“timestamp”: 1462380307239,

“response_code”: 200,

“shopkick_request_details”: {

“request_type”: 3,

“award_details”: {}

},

“user_location”: {

“latitude”: 51.048774100000003,

“coord_timestamp”: 1462380272815,

“accuracy”: 1100.0,

“longitude”: 8.5282112000000007,

“curr_timestamp”: 1462380274303

},

“session_ts”: “1462380272791”,

“language”: 2,

“device_details”: {

“device_kcid”: “blah”,

“screen_width”: 1080,

“device_type”: “samsung/SM-N9005”,

“device_os”: “Android/5.0”,

“screen_height”: 1920,

“device_id”: “anotherblah”

},

“experiment_id”: 0,

“locale”: “en-US”,

“response_time_ms”: 48

}

Any help would be greatly appreciated!

Thanks,

–Ben

Hi Ben,

Are you still running into problems with this?

I tried ingesting that example row with the provided spec, using tranquility/kafka in imply 1.2.1, and I was able to see the root-level dimensions as well as the ‘request_type’ nested field.

Thanks,

Jon

My apologies, I just saw this message. It was user error - I had some messages in my kafka queue with an older schema that did not have the request_type. So I was sometimes getting it and sometimes not which was very confusing.

Thanks,

–Ben

Hi Jon,

Sorry to add to this old thread, but I was trying to flatten the json file in the example here cause it was mentioned that it works, but it does not work for me for some reason.

I’m using druid v 0.10.0 with Tranquility 0.8.0

Should this work with the normal HTTP post? Appreciate your help in advance. Thanks!

curl -XPOST’Content-Type: application/json’ --data ‘{“client_app_version”:“shopkick/4.8.8”,“server_ip”:“127.0.0.1”,“user_id”:1,“server_name”:“myserver004”,“remote_ip_address”:“127.0.0.1”,“timestamp”:“2017-06-28T09:38:35Z”,“response_code”:200,“shopkick_request_details”:{“request_type”:3,“award_details”:{}},“user_location”:{“latitude”:51.0487741,“coord_timestamp”:1462380272815,“accuracy”:1100,“longitude”:8.528211200000001,“curr_timestamp”:1462380274303},“session_ts”:“1462380272791”,“language”:2,“device_details”:{“device_kcid”:“blah”,“screen_width”:1080,“device_type”:“samsung/SM-N9005”,“device_os”:“Android/5.0”,“screen_height”:1920,“device_id”:“anotherblah”},“experiment_id”:0,“locale”:“en-US”,“response_time_ms”:48}’ http://localhost:8200/v1/post/client_logs

It doesn’t seem to be flattened; looking at the metadata it gives me this (curl -XPOST -H ‘Content-Type:application/json’ -d ‘{“queryType” : “segmentMetadata”,“dataSource”: “client_logs”}’ http://localhost:8082/druid/v2/?pretty):

[

{

“id”: “client_logs_2017-06-28T09:00:00.000Z_2017-06-28T10:00:00.000Z_2017-06-28T09:39:49.382Z”,

“intervals”: [

“2017-06-28T09:00:00.000Z/2017-06-28T09:38:35.001Z”

],

“columns”: {

“__time”: {

“type”: “LONG”,

“hasMultipleValues”: false,

“size”: 0,

“cardinality”: null,

“minValue”: null,

“maxValue”: null,

“errorMessage”: null

},

“client_app_version”: {

“type”: “STRING”,

“hasMultipleValues”: false,

“size”: 0,

“cardinality”: 1,

“minValue”: “shopkick/4.8.8”,

“maxValue”: “shopkick/4.8.8”,

“errorMessage”: null

},

“count”: {

“type”: “LONG”,

“hasMultipleValues”: false,

“size”: 0,

“cardinality”: null,

“minValue”: null,

“maxValue”: null,

“errorMessage”: null

},

“device_details”: {

“type”: “STRING”,

“hasMultipleValues”: false,

“size”: 0,

“cardinality”: 1,

“minValue”: “{device_id=anotherblah, device_os=Android/5.0, screen_width=1080, device_type=samsung/SM-N9005, screen_height=1920, device_kcid=blah}”,

“maxValue”: “{device_id=anotherblah, device_os=Android/5.0, screen_width=1080, device_type=samsung/SM-N9005, screen_height=1920, device_kcid=blah}”,

“errorMessage”: null

},

“experiment_id”: {

“type”: “STRING”,

“hasMultipleValues”: false,

“size”: 0,

“cardinality”: 1,

“minValue”: “0”,

“maxValue”: “0”,

“errorMessage”: null

},

“language”: {

“type”: “STRING”,

“hasMultipleValues”: false,

“size”: 0,

“cardinality”: 1,

“minValue”: “2”,

“maxValue”: “2”,

“errorMessage”: null

},

“locale”: {

“type”: “STRING”,

“hasMultipleValues”: false,

“size”: 0,

“cardinality”: 1,

“minValue”: “en-US”,

“maxValue”: “en-US”,

“errorMessage”: null

},

“remote_ip_address”: {

“type”: “STRING”,

“hasMultipleValues”: false,

“size”: 0,

“cardinality”: 1,

“minValue”: “127.0.0.1”,

“maxValue”: “127.0.0.1”,

“errorMessage”: null

},

“response_code”: {

“type”: “STRING”,

“hasMultipleValues”: false,

“size”: 0,

“cardinality”: 1,

“minValue”: “200”,

“maxValue”: “200”,

“errorMessage”: null

},

“response_time_ms”: {

“type”: “STRING”,

“hasMultipleValues”: false,

“size”: 0,

“cardinality”: 1,

“minValue”: “48”,

“maxValue”: “48”,

“errorMessage”: null

},

“server_ip”: {

“type”: “STRING”,

“hasMultipleValues”: false,

“size”: 0,

“cardinality”: 1,

“minValue”: “127.0.0.1”,

“maxValue”: “127.0.0.1”,

“errorMessage”: null

},

“server_name”: {

“type”: “STRING”,

“hasMultipleValues”: false,

“size”: 0,

“cardinality”: 1,

“minValue”: “myserver004”,

“maxValue”: “myserver004”,

“errorMessage”: null

},

“session_ts”: {

“type”: “STRING”,

“hasMultipleValues”: false,

“size”: 0,

“cardinality”: 1,

“minValue”: “1462380272791”,

“maxValue”: “1462380272791”,

“errorMessage”: null

},

“shopkick_request_details”: {

“type”: “STRING”,

“hasMultipleValues”: false,

“size”: 0,

“cardinality”: 1,

“minValue”: “{request_type=3, award_details={}}”,

“maxValue”: “{request_type=3, award_details={}}”,

“errorMessage”: null

},

“user_id”: {

“type”: “STRING”,

“hasMultipleValues”: false,

“size”: 0,

“cardinality”: 1,

“minValue”: “1”,

“maxValue”: “1”,

“errorMessage”: null

},

“user_location”: {

“type”: “STRING”,

“hasMultipleValues”: false,

“size”: 0,

“cardinality”: 1,

“minValue”: “{latitude=51.0487741, accuracy=1100, coord_timestamp=1462380272815, longitude=8.528211200000001, curr_timestamp=1462380274303}”,

“maxValue”: “{latitude=51.0487741, accuracy=1100, coord_timestamp=1462380272815, longitude=8.528211200000001, curr_timestamp=1462380274303}”,

“errorMessage”: null

}

},

“size”: 0,

“numRows”: 1,

“aggregators”: null,

“timestampSpec”: null,

“queryGranularity”: null,

“rollup”: null

}

]

Hi,

were you able to get this working ? I am running into a similar problem, non-nested fields work. But nested fields are not flattened out.