What happens to Theta Sketch during Roll Up

Hi,

We want to understand how the roll Up strategy works for theta sketch columns.

Assume we have following Dimension Spec :

“dimensionsSpec” :

{

“dimensions” :

[

“eventType”,“eventId”

],

“dimensionExclusions” :

}

“metricsSpec” :

[

{ “type” : “longSum”, “name” : “eventCount”, “fieldName”: “eventCount” },

{ “type” : “thetaSketch”, “name”: “userId_Sketch”, “fieldName”: “userId” }

]

Please note that here we are calculating Theta Sketch on non dimensional data (“userId”).

Given this spec, if we receive following two rows under given “queryGranuality” :

{timestamp:1234,userId:“abc”,eventType:“same”,eventId:999,eventCount:1}

{timestamp:1234,userId:“def”,eventType:“same”,eventId:999,eventCount:1}

As we know these columns are definitely going to be rolleUp, so follwoing is the row which will be inserted to Druid Finally :

{timestamp:1234,eventType:“same”,eventId:999,eventCount:2, userId_Sketch:}

eventCount became 2 because rollUp strategy is longSum

BUT What about userId_Sketch , How the theta sketch is calculated. Ideally I would want these two rows to be counted as unique at the time of Querying using THETASKETCH Aggregation.

Please help us in understanding this and how thethasketch rollUp happens.

Thanks,

Pravesh Gupta

The sketch will represent the set of distinct userId values (two values in this example).
You can merge such sketches later across dimensions and get estimates of distinct count.

There are post-aggs to get error bounds as well (lower bound and upper bound).

The count is exact up to a point it reaches some limit and goes into the estimation mode (depending on the parameter that controls sketch size and accuracy).