Druid Spatial Dimension Batch Ingestion

Hi All,

I’m trying out ingestion Spatial Dimensions (lat, long) with Druid batch ingestion with Hadoop. So far the only way I’ve got it to work is to include the lat, long values as floats in both dimensions and spatialDimensions fields in the dimensionsSpec.

But if I understand correctly, this is terribly inefficient because cardinality for lat/long values are extremely high. So the question is, is there a way to get spatialDimensions working without listing lat/long as dimensions? If so, what should the dimensionsSpec look like?

My test data:

{

“timestamp”: 1494842519,

“a”: “abc”,

“count”: 10,

“lat”: “38.832401275634766”,

“long”: “-76.90840148925781”,

“coord123”: [

“38.832401275634766”,

“-76.90840148925781”

]

}

``

And this is the working but inefficient ingestion spec I mentioned:

{

“dataSchema”: {

“dataSource”: “geo-test”,

“parser”: {

“type”: “hadoopyString”,

“parseSpec”: {

“format”: “json”,

“dimensionsSpec”: {

“dimensions”: [

{

“name”: “a”,

“type”: “string”

},

{

“name”: “lat”,

“type”: “float”

},

{

“name”: “long”,

“type”: “float”

}

],

“spatialDimensions”: [

{

“dimName”: “coordinates”,

“dims”: [

“lat”,

“long”

]

}

]

},

“timestampSpec”: {

“format”: “posix”,

“column”: “timestamp”

}

}

},

“metricsSpec”: [

{

“type”: “count”,

“name”: “count”

}

],

“granularitySpec”: {

“type”: “uniform”,

“segmentGranularity”: “HOUR”,

“queryGranularity”: {

“type”: “none”

},

“rollup”: true,

“intervals”: [

“2017-05-15T10:00:00.000Z/2017-05-15T11:00:00.000Z”

]

}

}

}

``

If possible I’d like remove “lat”, and “long” from the dimensions.

I’ve tried all sorts of different combination of methods but still can’t figure this out. If you have experience dealing with ingestion spatial dimensions with Hadoop batch, please let me know!

Best,

Augustus