One of our most frequent queries currently involves the number of unique identifiers seen between time A and B and also between time C and D. It is my understanding that estimating this is possible via intersection of HyperLogLogs, which can in turn be estimated using the union of HyperLogLogs.

Is it possible to either internally compute a HyperLogLog union between multiple separate intervals or compute the intersection directly?


Hi Patrick, the error rates are going to be very high with unions between Hyperloglog set. You can instead look at some new libraries that are in development to do these types of queries approximately:


These libraries should hopefully be open sourced in the near future.

I understand how to compute intersection thetaSketch of two fields in one row.

but cannot figure out how to compute intersection thetaSketch of the same field in diffrerent row ( for exampe, want to compute retained users count of 7 days )

please help me for this problem , thanks.

