We’re trying to understand how best to partition a large amount of data.
Though it appears to be stored in the segment metadata, it’s not clear to me if Druid queries respect all characteristics of a shard spec. If you create partitions by dimension (say tenant id), can queries limited on that dimension be smart enough to only go to the necessary partitions? Is this even a useful optimization (say for top-n queries) or is the increased parallelism of going to many partitions generally preferable?
Do any users manually partition by managing several datasources, and are queries that combine multiple sources performant?
Thanks for any tips.