I’m currently running druid hadoop indexing that ingests data with multi schemas. For instance we have these:
event0: {dim0,dim1,metric0,metric1}, event1: {dim2,metric2,metric3}


and I want to count the occurence of event0 and event1, for now we are using javascript aggregator as below:
“type”: “javascript”,
“name”: “count0”,
“fieldNames”: [“dim0”],
“fnAggregate” : “function(current, a) { if (a==null)return current;else return current+1 }”,
“fnCombine” : “function(partialA, partialB) { return partialA + partialB; }”,
“fnReset” : “function() { return 0; }”


but as the data volumn growing, I’m worried about the efficiency of the javascript aggregators, so my question is that is there some work around for this? Or best way is write a CountIfExistAggregator likes the CountAggregator?

Perhaps you could introduce a new dimension, ‘event_type’. This can then hold the value “event0” or “event1”. Then at query time, you can filter for only events with a certain event type. The normal counts will then be accurate.

This is all theoretical, and I’m not sure about the performance impact of this solution but it might be something you could try.



Druid also supports filtered Aggregators natively.


  "type" : "filtered",
  "filter" : {
    "type" : "selector",
    "dimension" : "dim0",
    "value" : null
  "aggregator" : {"type" : "count", name:"count0"}

more details can be found here -

