[druid-user] Downsampling with Druid

What are you hoping to do with the downsampled data? Do you need a set of items to pull into some other system? Or do you need to do aggregates on it?

If you’re wanting to do aggregates on it, it’s worth trying them in Druid without downsampling.

If you’re wanting to get a set of items to pull into some other system, then it depends on what kind of distribution you want to pull from. A topN query will get you the top N items sorted by some metric, so it’s good if you want to pull the most popular/biggest/whatever set of things. A select query will get you the first N items sorted by time. Or if you want a roughly X% sample, you could do a query with a filter that picks rows with rand() < X. You could also weight this by the number of events that rolled up into that row, if you want to sample events rather than rows.