thetaSketch intersect precision uncontrollable

Hi,
I use druid-0.9.2, and using thetaSketch intersect to do retention analysis, to get more precise result, we change thetaSketch size from default 18384 to 1048576,

the precision is ok(which error rate is in 1%) in most case, but when intersection using a small set with a relatively big set, the precision is uncontrollable.

For example in production,

in our certain dataSource

case 1:

day1’s user after-7-day-duration-user intersection-user real-intersection-user error-rate

22112 3965203 14275 14441 1.1%

case 2:

day1’s user after-7-day-duration-user intersection-user real-intersection-user error-rate

71260 1096026 28298 30845 8.2%

Dose anyone encounter same problem? And have any ideas or optimization ?

Any infos will be thankful!

My (layman’s) understanding of thetaSketch is that the error is going to be relative to the size of the original sets. So if you do an intersection or set difference and the result is “small” compared to the original sets, then error can be large.

Maybe someone more familiar with the algorithm can comment further.