I have a couple of users column, one of them uses hyperUnique column type and second one uses thetaSketch column type during data ingestion.
The original value of distinct counts of users for a single day in Hive is 38030.
But when i ingest the HLL and thetaSketch columns in Druid, i get 35240 (HLL), 35530 (ThetaSketch).
The error in both cases is about 6.5 %
Does anyone know why the error could be so high? I even tried thetaSketch with size 131072 which is supposed to give 0.83% error in 99.73 percent cases.