Need a CountIfExistAggregator

Hi group,

I’m currently running druid hadoop indexing that ingests data with multi schemas. For instance we have these:
event0: {dim0,dim1,metric0,metric1}, event1: {dim2,metric2,metric3}

``

``
and I want to count the occurence of event0 and event1, for now we are using javascript aggregator as below:
{
“type”: “javascript”,
“name”: “count0”,
“fieldNames”: [“dim0”],
“fnAggregate” : “function(current, a) { if (a==null)return current;else return current+1 }”,
“fnCombine” : “function(partialA, partialB) { return partialA + partialB; }”,
“fnReset” : “function() { return 0; }”
}

``

but as the data volumn growing, I’m worried about the efficiency of the javascript aggregators, so my question is that is there some work around for this? Or best way is write a CountIfExistAggregator likes the CountAggregator?

Hi Zhao,

Perhaps you could introduce a new dimension, ‘event_type’. This can then hold the value “event0” or “event1”. Then at query time, you can filter for only events with a certain event type. The normal counts will then be accurate.

This is all theoretical, and I’m not sure about the performance impact of this solution but it might be something you could try.

Greetings,

Maarten

Hi,
Druid also supports filtered Aggregators natively.

e.g.

{
  "type" : "filtered",
  "filter" : {
    "type" : "selector",
    "dimension" : "dim0",
    "value" : null
  }
  "aggregator" : {"type" : "count", name:"count0"}
}

more details can be found here -

http://druid.io/docs/latest/querying/aggregations.html

Cool! I’ll try it out.

Thanks Nishant!

在 2015年8月31日星期一 UTC+8下午10:52:11,Nishant Bangarwa写道:

Hi Maarten,

Thanks for reply, it should work, but I think there is a bit of performance impact that turn N metrics into ONE dimension: largen the row counts by the N times, more bitmap indexs, less metric columns. And inconvenience when query: needs to do some transformation. So I decide not to go this way.

在 2015年8月31日星期一 UTC+8下午8:03:30,Maarten Rijke写道: