Filtering metrics

Hey,

I am working on a use-case where in I need the count of metrics whose value is greater than 0.

I went through the druid-documentation but could only find ways to aggregate on filtered dimensions and not metrics. And since my query has other aggregations as well, i cannot filter out the records. Hence, I thought of using javascript aggregators.

Following is the aggregator I have come up with:

{

“type”: “javascript”,

“name”: “cnt”,

“fieldNames”: [“metricName”],

“fnAggregate” : “function(current, a) { if(a===0 || a ===null ){return current;}else {return (current+1);}}”,

“fnCombine” : “function(partialA, partialB) { return partialA + partialB; }”,

“fnReset” : “function() { return 0; }”

}

However, this doesn’t seem to be giving the correct results! Can someone please point out if the logic is flawed or where am I going wrong?

Also, it would be great if someone could explain exactly how the javascript aggregator works

Thanks

Asra

To add to the above question, my query is a topN query hence I cannot use a groupBy with ‘having’ clause.

Also, the count of the metric is needed based on grouping on a dimension

Hey Asra,

What about the results you get is not correct? What result do you get and what did you expect?

Hey Gian

I am comparing the results against a hive query that I am firing and the number from the druid query is lesser than the numbers from the hive query

Does the aggregation seem right to you?

Hey,

Had another doubt… How does druid ingest null metrics i.e metrics whose values are not present in the payload? Does it ingest it as null or as a 0 or some other value?

I wonder if the lesser number in druid is due to multiple rows being rolled up as a single one. Can you try to do a sum over a count column in your JS agg instead of adding 1 for each druid row?

For Float and Long metrics null is equivalent to 0.

Hey Nishant,

So when I do a sum via the JS aggregator, I get the correct value of the sum. The aggregator looks like this:

{

“type”: “javascript”,

“name”: “cnt”,

“fieldNames”: [“metricName”],

“fnAggregate” : “function(current, a) { if(a===0 || a ===null ){return current;}else {return (current+a);}}”,

“fnCombine” : “function(partialA, partialB) { return partialA + partialB; }”,

“fnReset” : “function() { return 0; }”

}

Also, another thing I noticed was that when I change the fnReset to return 1 instead of 0, the count increase by 4. Any idea why is that so?

My metric is of type doubleSum, so that means that the null values will be ingested as 0 then?

The segment that I am querying is stored into 4 shards. Maybe thats the reason, the count is increasing by 4 when I increase the fnReset return value.

Yes, All null values will be ingested as 0.
your reasoning for the increase in count by 4 looks correct. Reset fn should return 0 instead of 1.