We provide self-service analytics in the cloud based on Druid (among the other technologies) in multi-tenant architecture. We use Kafka Indexing Service for data ingestion and S3 for deep storage - now a common bucket for all customers. We have separated Kafka topics for each customer and separated KIS processes.
Any ideas how to configure separate S3 buckets for deep storage per each data source? (European customers would prefer to store data in Europe, US customers in the US, etc.)
I think you could do this with something like:
Set up pools of middleManagers, each with a different S3 bucket configured
Use the “affinity” feature to pin certain datasources to certain pools of middleManagers (& make sure to set strong: true)
Make sure historicals have the ability to access all buckets
However, I’m not sure that it’d be the best solution to your problem as you stated it. Since Druid historical nodes store a relatively permanent copy of the data, if you have customers that want data stored in a particular region, you probably want to think about the historical nodes in addition to thinking about S3. So that points to deploying an entire independent Druid cluster in each region.
Thanks Gian, that helps.
Yes, I’m sure we will end up with independent cluster for each region, but want to start small now.