Reusing postAggregations with thetaSketch calculations

Hello

We have a complex formula of the form |(X1 ∩ Y1) ∪ (X2 ∩ Y2) ∪ … ∪ (Xn ∩ Yn)| which we answer by querying a Druid datasource with a thetaSketch metric (a query example is attached).

This query works well. The problem is that we also need the partial estimations: |X1 ∩ Y1|, |X2 ∩ Y2| … |Xn ∩ Yn|

For that, we use n queries to Druid to get each such intersection.

Since we’re already querying for this data in the large query, is there a way to get the intermediate results without querying twice?

I know it is possible to to create a postAggregation that calculates thetaSketchSetOp of Xi ∩ Yi without wrapping it with thetaSketchEstimate, and then use this postAggregation in 2 separate postAggregations: one which calculate |Xi ∩ Yi| and another which calculates the full formula. But the problem is that in that case the thetaSketch object which represents Xi ∩ Yi will be returned in the result of the query which makes the result enormous and unusable in our use case.

Is there a way to suppress a postAggregation output in the query result? Alternatively, is there a better way of doing what I described without having to repeat calculations?

Thank you!

Roman

large-query-example.txt (3.55 KB)

Hey Roman,

There isn’t currently a way to suppress post-aggregator output in a native JSON query. Druid SQL does have a way to do it (well, it does it automatically: of course it only returns expressions that you specify in the SELECT clause, not any intermediate stuff it has to generate.) But Druid SQL, as of 0.13, doesn’t yet support datasketches.

No clever approach comes to mind right now, so I think you’d want either a feature to suppress things from native query results (I could see it being useful for anything: dimensions, aggregators, or post-aggregators); or you want a feature that adds datasketches support to Druid SQL. IMO, the second one is cooler, since Druid SQL is cool. But if you’re interested in doing either one, pop on over to GitHub or the druid dev list to talk more!