Geospatial aggregations in Druid

Hi all,
I would like to implement a geo heatmap on top of a large dataset - billions of datapoints per day. I was thinking about using Druid for this (and for many other OLAP queries). But to implement geo heatmap, I need to have spatial aggregation to be able to group raw lattitude, longitude pairs of those data points to cells of a map grid. Is there any support for these kind of aggregations in Druid?

Can you group-by spatial index in Druid? I’m afraid not, as documentation mentions only spatial filtering…

But maybe someone is aware of any Druid extensions that support this use case? Or practical workarounds? Like: would it be feasible to make a thousand of filtering queries for each cell in a grid I want to display? Probably not… :slight_smile:

And/or are there plans to implement spatial aggregations in Druid?

Extra question: If Druid unfortunately does not support this use case, anyone have a good alternative? I know ES + Kibana does have it, but I’m afraid it would not support my scale…

Thanks for any answers and suggestions,

Krzysztof Zarzycki

Hi Krzysztof,

There is not explicit support for grouping by geo coordinates at this time. However you can use Druid for it, if you choose a grid and add a column to the datasource indicating which grid square the row is in (before loading the data). You could also get Druid to do this grouping at query time rather than doing it yourself before loading the data, using an aggregator extension.

Just to elaborate more, If you had additional dimension column which had lat/long in a string … then can group rows by this column. More interestingly, you could use custom extraction-function on this column to group rows on the transformed value of dimension e.g. if you had two rows with “1.1/3.2” and “1.2/3.3” in them then your extraction function could map both of those to “1/3” and group rows together in same “proximity” .

– Himanshu