# Rollup Ingestion on weighted metric

Hi guys! I might have a silly question, but can’t fina an answer…

I have data in csv file similar like this:

``````date       | ad_network_id | ads_impressions| ads_revenue | ecpm
----------------------------------------------------------------
2021-12-20 |            10 |           2343 |      543.15 | 5.4
2021-12-20 |            10 |           3453 |      343.12 | 3.1
2021-12-20 |            10 |          12543 |       353.9 | 8.1
``````

date is timestamp

and then we have metrics:

My problem is ecpm… how ecpm can be a metric if I want it to be weighted by impressions?

After rollup I want my data to look like this:

``````__time     | ad_network_id | ads_impressions| ads_revenue | ecpm
----------------------------------------------------------------
2021-12-20 |            10 |          18339 |     1240.17 | 6.8136
``````

In this example ecpm formula:

``````ecpm = 5.4*2343/18339 + 3.1*3453 /18339 + 8.1*12543 /18339
``````

How can I make this kinda metric rollup? Any advice would be appreciated

Thank you

Here is a first approach to get the result at query time, if this helps at all:

``````WITH r AS (
FROM ecpm
GROUP BY 1, 2
)
FROM ecpm t JOIN r ON (t.__time = r.__time) AND (t.ad_network_id = r.ad_network_id)
GROUP BY 1, 2
``````

Depending on the size of the tables involved, this might need some tuning to work.

I doubt that you can make this work at ingestion time. What happens if your rollup is not perfect? All your numbers would be off because you would be dividing by a partial sum only.

1 Like

Been googling all day could not find any way to do it at ingestion time…

Know for sure making ecpm a doubleSum metric does not work… that only sums ecpm, which is not what I need…

Tried different transforms, but there is just no way to get full sum of impressions (or at least not as I know right now…)

Yes, the only option is just leave ecpm as a dimension & do it at query time. Your query @Hellmar_Becker does work, but I thought I can reduce storage footprint by reducing those three rows into one row… (in production its not gonna be 3 rows, but thousands…)

Is there any way to get summed impressions at ingestion time? cuz then I could just do transform:

``````"transforms": [
{
"type": "expression",
"name": "ecpm",
}
]
``````

ads_impressions_summed would that 18,339 number (all summed impressions)

If you enable Javascript, you can write a metric definition like this:

``````        {
"name": "custom_agg",
"type": "javascript",
"fieldNames": [
"ecpm"
],
"fnAggregate": "function(current, a, b) { return current + a * b; }",
"fnCombine": "function(partialA, partialB) { return partialA + partialB; }",
"fnReset": "function() { return 0.0; }"
}
``````

which sums up `(ads_impressions * ecpm)`. You would still have to keep the sum of impressions as a separate metric and do the division at query time, because (as I understand) the Javascript aggregator can handle only single floating point return types. But that way you can do the rollup.

I will continue digging.

Come to think of it, would this do:

Define a transform to compute the total spend per row as (impressions * ecpm) and sum this one up

``````      "metricsSpec": [
{
"name": "count",
"type": "count"
},
{
"type": "longSum",
},
{
"type": "longSum",
},
{
"type": "doubleSum",
},
{
"name": "sum_ecpm",
"type": "doubleSum",
"fieldName": "ecpm"
},
{
"type": "doubleSum",
}
],
"transformSpec": {
"transforms": [
{
"type": "expression",
"expression": "1.0 * ads_impressions * ecpm"
}
]
}
``````

then you compute the overall eCPM as (ads_spend / sum_ads_impressions) at query time?

1 Like

Yes man! It worked like Magic! Thank you very much

Now storage footprint is more than 10 times smaller.
From my previous example, data in csv file look like this:

``````date       | ad_network_id | ads_impressions| ads_revenue | ecpm
----------------------------------------------------------------
2021-12-20 |            10 |           2343 |      543.15 | 5.4
2021-12-20 |            10 |           3453 |      343.12 | 3.1
2021-12-20 |            10 |          12543 |       353.9 | 8.1
``````

After ingestion data turns out like this:

``````__time     | ad_network_id | sum_ads_impressions| sum_ads_revenue | sum_ecpm_imp
---------------------------------------------------------------------------------
2021-12-20 |            10 |              18339 |         1240.17 |     124954.8
``````

Dimensions & metrics:

``````"metricsSpec": [
{
"type": "longSum",
},
{
"type": "doubleSum",
},
{
"type": "doubleSum",
"name": "sum_ecpm_imp",
"fieldName": "ecpm_imp"
}
],
"dimensionsSpec": {
"dimensions": [
{
"type": "long"
}
]
}
``````

``````"transforms": [
{
"type": "expression",
"name": "ecpm_imp",
"expression": "1.0 * ads_impressions * ecpm"
}
]
``````

After everything is done, using this query to get results:

``````SELECT
__time,
SUM(imp) AS impressions,
SUM(rev) AS revenue,
SUM(sum_ecpm/imp) AS ecpm
FROM (
SELECT
__time,
SUM(sum_ecpm_imp) AS sum_ecpm
FROM my_testing_datasource
)
``````

Not sure if WITH or nested select is faster…

Results from query:

``````__time     | ad_network_id |     impressions |      revenue |   ecpm
-------------------------------------------------------------------
2021-12-20 |            10 |          18339  |     1240.17  | 6.8136
``````
1 Like

I don’t even think you need a sub-SELECT. This should be fine:

``````SELECT
__time,
Basically, I think you use the inner SELECT just to sum up the impressions in order to get the common denominator for all `ecpm` values. But you can just aggregate this on the fly in the main query.