In the docs for the PartitionsSpec, under “single-dimension partitioning” it says “Each segment will contain all rows with values of that dimension in that range.”
Is there anyway to store data in segments with this rule, but also split across multiple shards if the segment becomes too large? If I lowered the values for targetPartitionSize or maxPartitionSize, would this take precedence and force another shard with the same dimension values?
I am encountering a problem where a large increase in one customer’s data (we are partitioning by the customer’s id) results in uneven shard sizes. So much so that the druid indexing jobs are failing due to memory issues as well.
I realize hash-based partitioning is recommended for just this reason, but we decided to partition by customer id because the access patterns usually come in streams by customer id and we wanted to optimize for query performance and minimize data load into memory.