First / Last Row inside Timeseries with changing granularity

Hi,

i have a timeseries queue, where i try to get the first and last row inside this series (along with some other aggregations).

For this, i use a custom javascript, which i got from this group.

For First Row

“type”: “javascript”,

“name”: “Metric1First”,

“fieldNames”: [“metric1”],

“fnAggregate”: “function aggregate(current, metric1) { if (current == -Number.MAX_VALUE) { return metric1 } else { return current } }”,

“fnCombine”: “function combine(partialA, partialB) { return partialA }”,

“fnReset”: “function reset() { return -Number.MAX_VALUE }”

For Last Row

“type”: “javascript”,

“name”: “Metric1Last”,

“fieldNames”: [“metric1”],

“fnAggregate”: “function aggregate(current, metric1) { return metric1 }”,

“fnCombine”: “function combine(partialA, partialB) { return partialA }”,

“fnReset”: “function reset() { return 0 }”

I use the Indexing Service, where i set the granularitySpec to

“granularitySpec”: {

“type”: “uniform”,

“segmentGranularity”: “HOUR”,

“queryGranularity”: {

“type”: “none”

},

“intervals”: null

}

When i run the query with a granularity of “HOUR”, everythink works fine. I get the first and last row of this hour segment.

When i run the query with a granularity of “DAY”, anythink goes wrong. I don’t get the right first and last row, i got the first / last row just from the penultimate segment of the day. Every other aggregation is correct, the count of consumed rows is the same as the sum of all results at hour granularity. I did a “select” query and this showed me, that the rows are in the right order. Even when i checked the first and last row of the last segment of that day, the values i got from the day granularity query are even not the same as the first / last row from the last segment.

As i understand, the javascript runs for each row (FIFO by internal timestamp?) and so it has to return the first row which runs through the query (First Time, current is -Number.MAX_VALUE, which returns metric1, which will be for the following runs the value for current and this value won’t be -Number.MAX_VALUE anymore). The same goes for the last row. I thought, maybe it “resets” at every new segment, but then it should returns the right values just from the last segment, which they aren’t either.

Thanks

Matthias

I read a little bit more into, how Javascript works in timeseries. As i understand, the fnReset will be triggered at every segment of the datasource at the given interval.
In my HOUR query, i have only one segment (just less Testdata to prove the values), which returns the right result.

In my DAY query, i have more than one segment, which means, it triggers the fnReset funtion at each segment, which explains the result i get.

Is this right and why does druid.io triggers the fnReset function at each segment?

The only reason i see is, that druid.io calculates the values from each segment (for parallisation) and then calculates that deltas to the final value. But even with this, it should return the right values :slight_smile:

Thanks

Matthias