Translating from SQL

I prepared an installation of Apache Druid that takes data from a Kafka topic. It works very smoothly and efficiently.

I’m currently trying to implement some queries I use in BigQuery, to see if Druid fits my use case. Currently I’m stuck in two problems: the count of rows (grouped by some fields) for which a column value is an outlier and pagination.

For the first problem, in BigQuery I will essentially compute the first and third quartiles (q1 and q3) and then use something like (I’m interested only in “right” outliers):

SUM(IF(column_value > q3 + 1.5*(q3-q1), 1, 0))

This approach makes use of cte and joins: I compute the quartiles in a
cte with grouping and then I join it with the original table.

I was able to easily compute the quartiles and the outlier threshold with datasketch extension using a groupBy
query, but I’m not realizing how to perform a postAggregation that can perform the count. In theory, I may implement a second query using the thresholds obtained in the first. Unfortunately, I can get hundreds of thousands of
different values. That makes this approach unfeasible.

Second problem. As of Druid 0.17, there’s no query pagination support (it was formerly available through select
queries). I’m trying to emulate pagination in some way and I thought that a possible approach may be creating a virtual dimension that numbers the resulting rows of a query, so that I can filter on that dimension. This is something that can be easily done in SQL using the RANK function. I was wondering if there is anything similar in Druid.

Anyway, is there any consolidated pattern for tackling the two problems?

Thanks in advance,

Domenico Delle Side