How "easy" is to contribute "new" aggregation types? (thinking about rating & ranking aggregations)

Hello,

“inspired” by a previous question about “delta aggregations” (discrete differentiations) ( https://groups.google.com/forum/#!topic/druid-user/DfrR76e-z8I ) , I’ve been thinking about contributing some new aggregation mechanisms, but I don’t know if its possible without a deep understanding of databases theory.

My idea is not to contribute the proposed “delta aggregation”, but to contribute different “ranking aggregation” mechanisms. (Treating chosen integer columns as ranking positions and chosen floating point number columns as rating data). There are a lot of ranking (& rating) aggregation algorithms, ones simpler than others, and I believe that a little subset of them could be implemented in an efficient & online way).

Kind regards.

Hey acorrea,

You can check out druid-datasketches and druid-histogram in the main Druid repo, both are fully worked examples of aggregations being done as extensions. The basic requirements of an aggregator are:

  • An AggregatorFactory

  • An on-heap Aggregator returned by factory.factorize

  • An off-heap BufferAggregator returned by factory.factorizeBuffered

  • [optional] If you want to let users post-process your aggregator for display in different ways, you can do that with PostAggregators.

  • [optional] If you want to create your data structure at ingestion time and store it in a column on disk, you do that with a ComplexMetricSerde.

Hi Gian,

thank you. I’ll take a look on this.

As long as the aggregations are pretty much self-contained in one metric (you don’t try to pull in multiple metrics at once), and the aggregation methodology itself is commutative, then it should be very feasible to create an extension.

By the way, this use case looks like more a new query type than aggregator.
The way i see it it is basically a diff (intersection also) between Query_1 and Query_i.