Hey guys, if you’ve been paying attention to the 0.8.3 release notes (https://github.com/druid-io/druid/issues/2044), we recently added a new module in Druid called DataSketches. A new set of aggregators have been added that revolve around a new sketch algorithm called the theta sketch. Theta sketches can be used to approximate the number of unique elements in a set, much like the ‘hyperUnique’ aggregator (based on hyperloglog). One big difference between the two algorithms is that hyperUnique can only support set union operations, whereas theta sketches can support union, intersection, A NOT B, and other types of set operations. At a high level, this means theta sketches can be used for much more complex set analysis, and for use cases such as retention analysis. Theta sketches require more space than hyperUnique aggregators, so if you only require count (distinct) like operations, hyperUnique is still the way to go. To learn more about theta sketches, check out the blog post:

As the Datasketches library grows and gains more approximate algorithms, we’ll be adding them to Druid. To learn more about Datasketches, check out the website: