Get timestamp for the latest data available

Hi,

I see that to get the latest timestamp for the data indexed into Druid for a particular datasource, there are two options:

  1. The API call to Coordinator
  1. The API call to Broker
{
    "queryType" : "timeBoundary",
    "dataSource": "sample_datasource",
    "bound"     : < "maxTime" | "minTime" > # optional, defaults to returning both timestamps if not set
    "filter"    : { "type": "and", "fields": [<filter>, <filter>, ...] } # optional
}

as mentioned here https://druid.apache.org/docs/latest/querying/timeboundaryquery.html

Both of these return result which has the variable **maxTime. **

But I see that the values returned are different:

For the above example(batch ingestion with 1-hour segments),

I see that the coordinator API returns “maxTime”:"2019-07-17T00:00:00.000Z"

whereas the Broker API returns “maxTime”:"2019-07-16T23:00:00.000Z"

So, the Coordinator API returns the endpoint of the segment and the Broker API returns the start, am I right?

But isn’t it confusing to use the same name maxTime to denote 2 different things?

Also, how is this maxTime computed for **real-time ingestion? **

I see that even for real-time ingestion, the difference is the same: the Coordinator returns endpoint of latest segment and broker returns the start point. But since in real-time ingestion, late values might come in, how is maxTime calculated?

Hi Abhishek,

Ok a number of things to unpack here:

  1. Have you tried the new console? You might find it much easier to answer the questions you have there.

  2. There are more ways than what you listed :-p

There is dataSourceMetadata ( https://druid.apache.org/docs/latest/querying/datasourcemetadataquery.html )

Which is a much more efficient than timeBoundry but only gives you maxIngestedEventTime

If you want to use Druid SQL you could use SELECT MAX(__time) AS "maxTime" FROM tbl in Druid SQL (see screenshot)

And if you are using Druid 0.15.0 and feeling frisky you could do SELECT __time AS "maxTime" FROM tbl ORDER BY __time DESC LIMIT 1

  1. Basically the coordinator API gives you the end of the last segment and the rest of the ways give you the timestamp of the latest event. I am willing to bet that you have queryGranularity set to HOUR and are truncating the timestamps in your data to the start of the hour which would explain why you are only getting the start of the hour. Have a play with the other methods and see if that makes sense. Remember you can issue all of those queried (SQL and native) via the Druid console.

Hope this helps,

Vadim

Hi Vadim,

Thanks a lot for your detailed reply.

I see that { “queryType” : “dataSourceMetadata”, “dataSource”: “sample_datasource”} returned “maxIngestedEventTime” but it was the same as “maxTime” returned by the time Boundary Query.

I see that real-time ingestion is creating these segments which have time in the future with 0 sizes and then populates it with data. So, I am getting time in the future from “maxIngestedEventTime” and “maxTime”.

Is there a way I can get the latest timestamp for segments which have data?