Select distinct?

Is there a way to get a list of the distinct values a column has (over an interval)? Or to phrase it another way, I do a groupBy query over the interval and aggregate over a metric, which gets me what I want, but it seems like there should be a faster way since I just need a union of all the keys in segment dictionaries for the column of interest.

I’m ultimately trying to do a “TopN with groupBy” query, which doesn’t seem to be directly supported, so I’m testing out just submitting a series of topN queries with a filter for each “groupBy” value.



Hi Ron, you can use the hyperUnique or cardinality aggregators. HyperUnique will be faster but requires a special column to be built at ingestion time. Please note these are approximate results that are roughly 98% accurate. You can also look at PlyQL and issue SQL issues and let the internals of PlyQL figure out the most optimal way to query druid:

Thanks for the reply Fangjin - we need the exact values, but since discovering Plywood/Plyql, we’re going to be using that, hopefully, and let you guys figure out the best way for us :slight_smile:


Internally we use queried multiple times.