Hi folks,
I’m trying to calculate percentiles in druid query console. From what I understand from documentation, APPROX_QUANTILE_DS(), DS_GET_QUANTILE() etc. provided in Datasketches extension prove to be useful to reach an approximate value. But is there an way to fetch interpolated values from druid that are equivalent to standard results.
Case -
Dataset
5.71, 6.33, 9.64, 10.74, 18.39, 29.92, 57.06, 58.41, 113.27, 119.73
Percentiles
Excel/Actual Druid
6.268 6.33 10th Percentile
9.915 9.64 25th Percentile
24.155 29.92 50th Percentile
58.0725 58.41 75th Percentile
113.916 113.27 90th Percentile
Standard Percentile Calculation (50th)
-
Find index/rank: ((50/100)*(10+1)) = 5.5
-
If it is a whole number x, the answer is the datapoint at xth position.
-
If it is not a whole number, do interpolation.
5.5 → integer part: 5, fractional part: 0.5
3i. Take (rank) and (rank+1) rows: 18.39 and 29.92 and multiply by the fractional part.
(29.92 - 18.39) * 0.5 = 5.765
3ii. Add it to the smaller value (18.39):
18.39 + 5.765 = 24.155