Get latest row (is there a 'last' aggregator?)

Hi,

I have a use case where I want to get the latest/last row ingested into Druid.
What aggregator can I use to do that?

My data model is as follows:
timestamp, customer_id, balance
Here, customer_id is a dimension and balance is a metric. I want the last sent data for balance to be stored/retrieved in Druid.

Please guide me on how I can achieve the same.

Thanks,
GK

Hi, this question has come up before in the forums, but I can’t find those threads right now.

If you think about the SQL queries you want to make, you may want to submit a feature request to https://github.com/implydata/plyql if the feature does not already exist. Doing this in Druid will require using javascript aggregators

Are you just looking for SELECT * FROM table ORDER BY __time DESC ?

Opps forgot the limit: SELECT * FROM table ORDER BY __time DESC LIMIT 1

Thanks for the SQL suggestion, it works. I looked up the query JSON that it made and it was using the select query with a dummy metrics aggregation.
However, I was wondering that Druid would end up storing all the rows in its index and we will use SELECT to retrieve the last row. Is there a way that Druid only stores the last row itself. If Javascript aggregator can be used, how may one go about that? Something like passing __time and the column(balance) both to the JS function?

I found the earlier thread: https://groups.google.com/d/topic/druid-user/nCghqQBhUC8/discussion
The code example provided there does not seem to work for me. I guess, in the fnCombine, it’s adding the 2 partial results which makes it essentially a ‘sum’ aggregator.

{
“type”: “javascript”,
“name”: “last”,
“fieldNames” : [ “dim1”],
“fnAggregate” : "function(current, dim1) { return dim1 } ",
“fnCombine” : “function(partialA, partialB) { return partialA + partialB; }”,
“fnReset” : “function() { return 0; }”
}

``

If all you want to store is the last row why not just configure your Druid to have very aggressive data retention rules?