Performance of Druid Queries. Mutiple Smaller Individual Queries vs One Bigger Query?

Got a question related to the performance of Druid queries.

Should I focus on creating multiple smaller individual queries that run in parallel vs one bigger query?

Which Druid query approach would be expected to have a lower Druid query latency? Which Druid query approach would be expected to handle more Druid clients?

Also, will utilizing multiple filters change my approach?

For instance:

  • 5 Druid “timeseries” queries containing 1 aggregation per query verses one bigger timeseries Druid query containing 5 aggregations?
  • 5 Druid “topN” queries containing 1 aggregation per query verses one bigger “topN” Druid query containing 5 aggregations?

Generally the bigger queries should work better.

One exception to that is queries that return LOTS of rows (like groupBys or topNs with millions of rows in the resultset). In those cases the merging overhead can be substantial, so it helps to break them up. Another exception is if your UI would benefit from progressively drawing results, in which case it usually works better to issue queries as needed instead of querying everything up-front.