Do broker queries understand partition dimensions?

Hi druids

We’re trying to understand how best to partition a large amount of data.

Though it appears to be stored in the segment metadata, it’s not clear to me if Druid queries respect all characteristics of a shard spec. If you create partitions by dimension (say tenant id), can queries limited on that dimension be smart enough to only go to the necessary partitions? Is this even a useful optimization (say for top-n queries) or is the increased parallelism of going to many partitions generally preferable?

Do any users manually partition by managing several datasources, and are queries that combine multiple sources performant?

Thanks for any tips.

Max

Hey Max,

The partition dimension is not currently used for pruning the list of queryable segments, but it can still make queries faster by giving you better locality and reducing the merging workload.

We have some docs about different ways you can partition data multi-tenant clusters: http://druid.io/docs/0.9.0-rc3/querying/multitenancy.html

Thanks. That makes sense, and that document has a lot of useful info.