We are about to implement a multi-tenant druid schema, so we have a tenant_id column in our data source. What does implementing the partitionSpec on an index task achieve?
Specifically does that imply that we can independently index a tenant data. Meaning we can index the data for particular tenant 1 for hour x in one indexing job and have another indexing job for another tenant 2 for the same hour x. This provides us the flexibility to independently index/reindex the data for a particular tenant.
We don’t want to have 1 data source per client, since that leads to substantial overhead.
Index task above implies to either hadoop index task or the overlord index task.