Ho to get the latest available data for an id

Hy everyone,

Where looking into using Apache Druid to solve our current use case.

We have no experience with druid (yet)

We have ship location data being pushed on a kafka.

Each ships sends trough its location once a minute or maybe less. There is data for hundres to few thousend of ships being pushed.

Our use case is to visualize the ships an a map and make it possible for the users to go back in time an look at the map from months to hours ago.

The data in Apache druid looks like this.

{ “id”: “3001”,
“timestamp”: “2021-05-11T12:12:45.473Z”,
“assetType”: “SHIP_LOCATION”,
“longitude”: 1000,
“latitude”: 1000,
“metaData”: “some meta data”}

each ship has a unique ID and a location.

so we would like to retrieve the data within a bound and get the latest possible data for a given timestamp (we’ll start with ‘NOW’/near realtime )

we got a SQL version working, but think there should be a better way (native query?) to solve this.

SELECT * FROM SHIP_LOCATION tbl
Inner join (
SELECT max(__time) t , id
FROM SHIP_LOCATION
where latitude BETWEEN 500 AND 1500
and longitude BETWEEN 500 AND 3000
GROUP BY id

) sub
on tbl.__time = sub.t
and tbl.id = sub.id

I could not find any examples that cover this kind of query.

Any pointers on how we can solve this in a performant way with a native query are greatly appriciated.

Thanks in advance

I would use LATEST

for instance select ship_id,latest(metadata,20) from table where where latitude BETWEEN 500 AND 1500
and longitude BETWEEN 500 AND 3000 group by 1

This will give you latest metadata in time for each ship. This avoids the join and hence will perform much better

1 Like

Thanks for the tip on latest. From there we where able to figure it out.
For future reference :

So the query looks like this at the moment.

SELECT id,  LATEST(latitude) latitude , LATEST(longitude), LATEST(metaData, 1000) FROM SHIP_LOCATION 
GROUP BY id
HAVING LATEST(latitude) BETWEEN 500 AND 3000
AND LATEST(longitude) BETWEEN 500 AND 3000

So this results in a native query of the type “groupBy” and having aggregations of type stringLast or doubleLast and having a ’ having ’ clause with a And filter to filter on the last longitude.

any extra feedback is welcome!

full json:

{
  "queryType": "groupBy",
  "dataSource": {
    "type": "table",
    "name": "SHIP_LOCATION"
  },
  "intervals": {
    "type": "intervals",
    "intervals": [
      "2021-05-18T14:36:27.903Z/2021-05-18T15:36:27.903Z"
    ]
  },
  "virtualColumns": [],
  "filter": null,
  "granularity": {
    "type": "all"
  },
  "dimensions": [
    {
      "type": "default",
      "dimension": "id",
      "outputName": "d0",
      "outputType": "STRING"
    }
  ],
  "aggregations": [
    {
      "type": "stringLast",
      "name": "a0",
      "fieldName": "assetType",
      "maxStringBytes": 20
    },
    {
      "type": "doubleLast",
      "name": "a1",
      "fieldName": "latitude"
    },
    {
      "type": "doubleLast",
      "name": "a2",
      "fieldName": "longitude"
    },
    {
      "type": "stringLast",
      "name": "a3",
      "fieldName": "metaData",
      "maxStringBytes": 1000
    }
  ],
  "postAggregations": [],
  "having": {
    "type": "filter",
    "filter": {
      "type": "and",
      "fields": [
        {
          "type": "bound",
          "dimension": "a1",
          "lower": "500",
          "upper": "3000",
          "lowerStrict": false,
          "upperStrict": false,
          "extractionFn": null,
          "ordering": {
            "type": "numeric"
          }
        },
        {
          "type": "bound",
          "dimension": "a2",
          "lower": "500",
          "upper": "3000",
          "lowerStrict": false,
          "upperStrict": false,
          "extractionFn": null,
          "ordering": {
            "type": "numeric"
          }
        }
      ]
    },
    "finalize": true
  },
  "limitSpec": {
    "type": "default",
    "columns": [
      {
        "dimension": "d0",
        "direction": "ascending",
        "dimensionOrder": {
          "type": "lexicographic"
        }
      }
    ],
    "limit": 100
  },
  "context": {
    "populateCache": false,
    "sqlOuterLimit": 100,
    "sqlQueryId": "8cb5587e-c2d1-4930-8836-704654e8612f",
    "useApproximateCountDistinct": false,
    "useApproximateTopN": false,
    "useCache": false
  },
  "descending": false
}