Rollup average calculation while data ingestion

Hi,

i created metricSpec like -

“metricsSpec”: [
{
“type”: “count”,
“name”: “count”
},
{
“type”: “longSum”,
“name”: “sum_value”,
“fieldName”: “counter_value”,
“expression”: null
},
{
“type”: “longMin”,
“name”: “min_value”,
“fieldName”: “counter_value”,
“expression”: null
},
{
“type”: “longMax”,
“name”: “max_value”,
“fieldName”: “counter_value”,
“expression”: null
}
],

Here i want to add average for counter_value field also while ingestion ,
how to calculate average for a metric column while ingesting ?

Hey!

So average is not a supported ingestion-time function. There are a number of them, in fact, including FIRST and LAST:

The way I work around this (which is by design given that Druid is a parallel ingestion database) is to create the metrics that would allow for those kinds of calculation – which in this instance are a SUM and then COUNT – thus, SUM / COUNT = AVG. So at query time you SUM the sum metric, SUM the count metric, and then do a division.

1 Like

Oh and if approximates are OK, there’s also the Quantiles Sketch

1 Like

This is actually not a workaround. If you want to take the average of a column based on rolled up data, the only way to do it is to keep the sum and count as separate metrics and compute avg = sum/count at query time. Otherwise you would calculate the average of averages, which would be wrong. Let me give you an example:
You have hourly buckets
10-11h: 1000 rows, value=1 => sum=1000, count=1000, avg=1
11-12h: 1 row, value=2 => sum=2, count=1, avg=2
Total correct avg: 1002/1001=1.001
If you take the bucket averages: (1+2)/2=1.5 which is wrong
that’s why we don’t support averages during ingestion

1 Like

Thank you for the clarity, @Hellmar_Becker !!! I’m afraid my brain wasn’t in clear writing mode…!!!