Expected variation in hyperunique counts


What is the expected variation in hyperunique counts in druid against an actually computed unique count .

Is there a way to change this limit at the cost of performance?

My daily events would be around the range of 15-20 million , and the uniques would around the range of < 2million

hourly events would be around 1.5-2 million unique would be around 40-50k (uniques across day will be less and summing up uniques across hours)



Hi Manohar, have you read this blog post? http://druid.io/blog/2012/05/04/fast-cheap-and-98-right-cardinality-estimation-for-big-data.html

Thanks Charles,

Strangely I seem to be getting a 2.9% variation on a relatively small data set. Not sure if I am doing something wrong.


  • Increasing the number of buckets (the k) increases the accuracy of the approximation
  • Increasing the number of bits of your hash increases the highest possible number you can accurately approximate""
    Is there a way to change these values as an end user of druid to get a better accuracy? My data set is relatively small under normal circumstances

if you’re normally using small datasets and need more accurate results, you may want to look at theta sketches: https://datasketches.github.io/ which is included in https://github.com/druid-io/druid/releases/tag/druid-0.8.3


You can find the docs for druid datasketch module at http://druid.io/docs/0.8.3/development/datasketches-aggregators.html . It allows you to make the trade-off between accuracy and sketch size, you can see the details on datasketches.github.io as Charles pointed.

– Himanshu

Thanks ,

I am not able to view this page, which I assume has the right configurations to use(aggregatorName etc)


Thanks and Regards


Manohar, try again. I fixed the problem with that bad link.