[druid-user] hyperUnique

Hi, how to use hyperUnique?

I want to realize the following functions by using hyperUnique.

SELECT COUNT(DISTINCT(dimension)) FROM <datasource>

But I can’t understand the result. What does that mean?



I believe the screenshot you’re showing is of the ingestion setup? In which case, it looks like you’re telling Druid that it needs to ingest a hyperUnique-type column from your existing data.

First, I would use the official Apache Datasketches “HyperLogLog” over HyperUnique – DataSketches HLL Sketch module · Apache Druid
Is there a reason for using HU?

Secondly, there are two modes of using sketches for approximation in Druid – either just at query time or by setting up datasketches inside the data itself (which is what it looks like you’re doing there).

Check out this doc for information on the specific SQL functions to use:

Note that you can use a function “a regular column or an HLL sketch column” – if you do it on a sketched column, (a) it’s more efficient for your underlying data capacity, and (b) it’s faster :slight_smile:

Is this helping?!

  • pete

Just to add further detail, it’s showing that it will ingest a sketch with stats about the dimension that you’re rolling up. At query time, you can use your query to get the (estimated) counts.