High Cardinality TopN

Hello All,
I am using Druid to visualize a high cardinality data set. Ideally, I would like to issue a TopN query over my data set and have it return the dimensions with the highest average over the entire query interval.

Specifically, I would like to get the Top 5 highest dimensions if you were to take all points and average them across the entire dataset. Then I would like it to return all data points in that entire interval inclusive of the timestamp to draw a line graph.

Let me know if there are any questions or confusion, I really appreciate the help!

Hey Jerred,

You can do a topN by average using a topN query that has a postAggregator that divides a sum metric by an event count metric, and that has its ordering metric set to that postAggregator. See here for more details: http://druid.io/docs/latest/querying/topnquery.html

Timeseries queries are probably the easiest way to draw line graphs: http://druid.io/docs/latest/querying/timeseriesquery.html

Hope that’s helpful. If not, can you explain in a little more detail what you’re trying to do? (some equivalent SQL would help)

Hey Gian,
This is incredibly close to what I am looking for. I essentially want to take the result of the first query you had illustrated, then get a timeseries query for all dimension names that are returned. Is this possible with some type of nested query?

If you need further explanation of what I a looking for I will dig up some SQL syntax.

Thanks again!

Hi Jerred,

What you want to do is not possible in Druid but is a very valuable use case. It is exactly for these kinds of nested use cases that I developed Plywood (https://github.com/implydata/plywood). Plywood is an ORM-like layer for Druid and query optimizer.

Here is an example of plywood doing an iterated nested topN: https://github.com/implydata/plywood/blob/master/docs/examples/example3-druid.js

You can use Plywood in 3 ways:

  1. directly like in the example above

  2. via PlyQL a SQL-like wrapper for Plywood (https://github.com/implydata/plyql) look at the last example in the readme, it is literally what you are trying to do.

  3. You can use Pivot (https://github.com/implydata/pivot) an open source UI for Druid built on top of Plywood

Here is a screenshot of Pivot doing the nested query that you are talking about: https://i.imgur.com/Sq2ikZb.png (Pivot is really just rendering the Plywood result, there is no fancy query logic in pivot itself).

Hey Vadim,

This is exactly what I am looking for! I really appreciate the response, I’ll give pivot and Plywood a shot.

Great. If you have any problems please open an issue on the respective project.