Arithmetic operations on DataSketch-Theta


Attached GroupBy query, iam computing 2 DataSketchs (trip count in Jan, trip count in Feb; and count in both Jan&Feb).

The query works fine (if you remove percent_of_both_jan_feb_to_only_jan_trips).

Iam trying to compute % on the DataSketch (% of combined trips to Jan trips). I get the following error message,

org.apache.druid.query.aggregation.datasketches.theta.SketchHolder cannot be cast to java.lang.Number

Is there a way to compute count from DataSketch (something like Approx_count_DS_Theta() SQL function) and then compute %?

Thanks in advance.

Regards, Chari.

14_native_json_GroupBy_ThethaSketch_NOT_PercentComputation_NOT_WORKING.json (2.28 KB)

In the arithmetic postagg, instead of:

{ “type” : “fieldAccess”, “name” : “b”, “fieldName” : “unique_trips_jan_day1” }

try using a thetaSketchEstimate postagg on unique_trips_jan_day1 instead, the field access would output the sketch object instead of the estimate.

Hi Jonathan,

I tried the following, but it is resulting into an error given below.

“type” : “arithmetic”,
“name” : “percent_of_both_jan_feb_to_only_jan_trips”,
“fn” : “/”,
“fields” : [
{ “type” : “fieldAccess”, “name” : “unique_trips_feb_day1”, “fieldName” : “final_unique_trips_inJan_notInFeb” },
{ “type” : “thetaSketchEstimate”, “name” : “unique_trips_jan_day1”, “fieldName” : “unique_trips_jan_day1” }


“error” : “Unknown exception”,
“errorMessage” : “Instantiation of [simple type, class org.apache.druid.query.aggregation.datasketches.theta.SketchEstimatePostAggregator] value failed: field is null (through reference chain: org.apache.druid.query.groupby.GroupByQuery[“postAggregations”]->java.util.ArrayList[2]->[“fields”]->java.util.ArrayList[1])”,
“errorClass” : “com.fasterxml.jackson.databind.JsonMappingException”,
“host” : null

I also tried setting both type to thetaSketchEstimate, even that is resulting in same error (just changed in index value of java.util.ArrayList[0]).

Please let me know.

Regards, Chari.

Hi Jonathan,

Attached updated JSON query spec, this works.

Looks like “post aggregation” metric can’t be re-used in another post-aggregation metric. I copied the “intersect” metric computation definition to other post aggregation metric and it works.

I hope the “intersect” post aggregation gets computed only once. :slight_smile:

Regards, Chari.

14_native_json_GroupBy_ThethaSketch_NOT_PercentComputation_PostAggregation.json (2.75 KB)

Hey Chari,

I think it will get computed twice. If it’s a problem you could try putting the “intersect” at top level of postAggregations, and then referring to it using a “fieldAccess” in the other two postaggs.


Hi Gian,

I tried, but i get the following error. The postAgg is “new_percent” in the JSON query attached with this mail.

Can you please help, if iam doing any thing wrong.

Regards, Chari.

test.json (2 KB)

Hey Chari,

I think your “new_percent” aggregator is probably the wrong type. It looks like it is supposed to be a division not a thetaSketchEstimate. How about something like:

		"type" : "arithmetic",
		"name" : "new_percent",
		"fn" : "/",
		"fields" : {
			{"type": "fieldAccess", "fieldName" : "final_unique_trips_both_Jan_Feb" },
			{"type": "fieldAccess", "fieldName" : "unique_trips_jan_day1_sketch_count" }

Hi Gian,

Thank you. My bad, i must have caught that mistake.

Regards, Chari.