How to define Integer/Number DATA Type when using flatten Spec

Hi Guys,

I am learning druid. I have one problem statement where I need to define data type of dimensions in case of flatten Spec.

  1. My JSON data is not fixed, it could have N number of keys like define below. Few would be of Integer type and other String.

{

“timestampe”:"",

“ATTRIBUTE_1”:1,

“ATTRIBUTE_2”:“abcs”

}

``

For above mentioned data , I specified this schema . I am using kafka ingestion.

{

“type”: “kafka”,

“dataSchema”: {

“dataSource”: “dimension_flat_spec_poc”,

“parser”: {

 "type": "string",

 "parseSpec": {

   "format": "json",

   "timestampSpec": {

     "column": "timestamp",

     "format": "auto"

   },

   "flattenSpec": {

     "useFieldDiscovery": true,

     "fields": [

       

     ]

   },

   "dimensionsSpec": {

     "dimensions": [

       

     ]

   }

 }

},

“metricsSpec”: [

],

“granularitySpec”: {

 "type": "uniform",

 "segmentGranularity": "DAY",

 "queryGranularity": "NONE",

 "rollup": false

}

},

“tuningConfig”: {

“type”: “kafka”,

“reportParseExceptions”: false

},

“ioConfig”: {

“topic”: “kafka_topic”,

“replicas”: 2,

“taskDuration”: “PT10M”,

“completionTimeout”: “PT20M”,

“consumerProperties”: {

 "bootstrap.servers": "<kafka broker>"

}

}

}

``

But all data is saving in VARCHAR format. All Dimension data type is VARCHAR , I want few attribute should save as integer.

What are the possible ways to achieve this without predefining attributes in dimensionSpec ?

Thanks in advance !!!

Hi -
Currently if you’re using JSON with fieldDiscovery everything not explicitly defined will default to string. Using Avro with a schema registry would better facilitate true schema evolution.

  • Ben

If what you said: “Few would be of Integer type and other String.” is true then there is actually a trick you can do:

You can define your few integer columns as metrics (even though you do not use rollup).

So if you know you will have a, b, c be integers you could add:

“metricsSpec”: [

{ “name”: “a”, “type”: “longSum”, “fieldName”: “a” },

{ “name”: “b”, “type”: “longSum”, “fieldName”: “b” },

{ “name”: “c”, “type”: “longSum”, “fieldName”: “c” }

]

this will make sure that these fields are ingested as longs.

The rest of the dimensions will be auto detected.