Hi,
Attached GroupBy query, iam computing 2 DataSketchs (trip count in Jan, trip count in Feb; and count in both Jan&Feb).
The query works fine (if you remove percent_of_both_jan_feb_to_only_jan_trips).
Iam trying to compute % on the DataSketch (% of combined trips to Jan trips). I get the following error message,
org.apache.druid.query.aggregation.datasketches.theta.SketchHolder cannot be cast to java.lang.Number
Is there a way to compute count from DataSketch (something like Approx_count_DS_Theta() SQL function) and then compute %?
Thanks in advance.
Regards, Chari.
14_native_json_GroupBy_ThethaSketch_NOT_PercentComputation_NOT_WORKING.json (2.28 KB)
In the arithmetic postagg, instead of:
{ “type” : “fieldAccess”, “name” : “b”, “fieldName” : “unique_trips_jan_day1” }
try using a thetaSketchEstimate
postagg on unique_trips_jan_day1
instead, the field access would output the sketch object instead of the estimate.
Hi Jonathan,
I tried the following, but it is resulting into an error given below.
{
“type” : “arithmetic”,
“name” : “percent_of_both_jan_feb_to_only_jan_trips”,
“fn” : “/”,
“fields” : [
{ “type” : “fieldAccess”, “name” : “unique_trips_feb_day1”, “fieldName” : “final_unique_trips_inJan_notInFeb” },
{ “type” : “thetaSketchEstimate”, “name” : “unique_trips_jan_day1”, “fieldName” : “unique_trips_jan_day1” }
]
}
{
“error” : “Unknown exception”,
“errorMessage” : “Instantiation of [simple type, class org.apache.druid.query.aggregation.datasketches.theta.SketchEstimatePostAggregator] value failed: field is null (through reference chain: org.apache.druid.query.groupby.GroupByQuery[“postAggregations”]->java.util.ArrayList[2]->org.apache.druid.query.aggregation.post.ArithmeticPostAggregator[“fields”]->java.util.ArrayList[1])”,
“errorClass” : “com.fasterxml.jackson.databind.JsonMappingException”,
“host” : null
}
I also tried setting both type to thetaSketchEstimate, even that is resulting in same error (just changed in index value of java.util.ArrayList[0]).
Please let me know.
Regards, Chari.
Hi Jonathan,
Attached updated JSON query spec, this works.
Looks like “post aggregation” metric can’t be re-used in another post-aggregation metric. I copied the “intersect” metric computation definition to other post aggregation metric and it works.
I hope the “intersect” post aggregation gets computed only once. 
Regards, Chari.
14_native_json_GroupBy_ThethaSketch_NOT_PercentComputation_PostAggregation.json (2.75 KB)
Hey Chari,
I think it will get computed twice. If it’s a problem you could try putting the “intersect” at top level of postAggregations, and then referring to it using a “fieldAccess” in the other two postaggs.
Gian
Hi Gian,
I tried, but i get the following error. The postAgg is “new_percent” in the JSON query attached with this mail.
Can you please help, if iam doing any thing wrong.
Regards, Chari.
test.json (2 KB)
Hey Chari,
I think your “new_percent” aggregator is probably the wrong type. It looks like it is supposed to be a division not a thetaSketchEstimate. How about something like:
{
"type" : "arithmetic",
"name" : "new_percent",
"fn" : "/",
"fields" : {
{"type": "fieldAccess", "fieldName" : "final_unique_trips_both_Jan_Feb" },
{"type": "fieldAccess", "fieldName" : "unique_trips_jan_day1_sketch_count" }
}
}
Hi Gian,
Thank you. My bad, i must have caught that mistake.
Regards, Chari.