Getting the first row in a timeseries query

Hey,

I would like to get the value of a particular dimension for the first (and last) event in a given granularity.

Example:

If I have dimensions = [‘timestamp’, ‘dim1’]

I would like to do a TimeSeries (or GroupBy) query to get the value of dim1 for the first and last rows (the rows with the min and max timestamp) in the grouping.

{
“queryType”: “timeseries”,
“dataSource”: “sample_datasource”,
“granularity”: “day”,

“intervals”: [ “2012-01-01T00:00:00.000/2012-01-03T00:00:00.000” ]
}

``

One way I can think of doing it is with a min and max aggregator on timestamp, and then a seperate experimental select query to match exactly that timestamp. Or maybe a custom JavaScript aggregator? I was secretly hoping for an aggregator called ‘first’!

I’m pretty new to Druid, so any help would be greatly appreciated!

Thanks,

Richard

Hi richard,
there are max and min aggregators but no native first and last value aggregators at presnt.

you can do this by having two different custom javascript aggregators in a timeseries query,

first aggregator will store the first value and ignore all subsequent new values given to it.

and the last aggregator will just store the value thats been given to it and overwrites it every time its given a new value, thus having the last value given to it at any time.

Thanks very much.

For anyone who needs it. Here was my best effort on the aggregations, although there might be a better way.

{
“type”: “javascript”,
“name”: “first”,
“fieldNames” : [ “dim1”],
“fnAggregate” : "function(current, dim1) { if(current == -Number.MAX_VALUE) {return dim1} else {return current} } ",
“fnCombine” : “function(partialA, partialB) { return partialA + partialB; }”,
“fnReset” : “function() { return -Number.MAX_VALUE; }”
},
{
“type”: “javascript”,
“name”: “last”,
“fieldNames” : [ “dim1”],
“fnAggregate” : "function(current, dim1) { return dim1 } ",
“fnCombine” : “function(partialA, partialB) { return partialA + partialB; }”,
“fnReset” : “function() { return 0; }”
}

``

Is there a more straight-forward way to find the most recent row that matches a filter? I tried the below javascript aggregations and ‘last’ didn’t work. This would be very helpful as I’d like to use this information to decide what granularity to set (If I know TS data will span one month vs 5 minutes).

Thanks!

FWIW to anyone reading this: first and last aggs can not be implemented in JS because the merging order in fnCombine is not guaranteed. The solution posted by Richard, although very cool, will not work in the general case. Here is the issue: https://github.com/druid-io/druid/issues/2845 if you want to follow or +1