Hy everyone,
Where looking into using Apache Druid to solve our current use case.
We have no experience with druid (yet)
We have ship location data being pushed on a kafka.
Each ships sends trough its location once a minute or maybe less. There is data for hundres to few thousend of ships being pushed.
Our use case is to visualize the ships an a map and make it possible for the users to go back in time an look at the map from months to hours ago.
The data in Apache druid looks like this.
{ “id”: “3001”,
“timestamp”: “2021-05-11T12:12:45.473Z”,
“assetType”: “SHIP_LOCATION”,
“longitude”: 1000,
“latitude”: 1000,
“metaData”: “some meta data”}
each ship has a unique ID and a location.
so we would like to retrieve the data within a bound and get the latest possible data for a given timestamp (we’ll start with ‘NOW’/near realtime )
we got a SQL version working, but think there should be a better way (native query?) to solve this.
SELECT * FROM SHIP_LOCATION tbl
Inner join (
SELECT max(__time) t , id
FROM SHIP_LOCATION
where latitude BETWEEN 500 AND 1500
and longitude BETWEEN 500 AND 3000
GROUP BY id
) sub
on tbl.__time = sub.t
and tbl.id = sub.id
I could not find any examples that cover this kind of query.
Any pointers on how we can solve this in a performant way with a native query are greatly appriciated.
Thanks in advance