Multitenancy

Hello druid-users,
We are planning to setup a druid cluster which would be used by multiple users and we have following questions:

  1. Is is possible to apply user level resource quotas in druid?For eg: Is is possible to say that the queries fired by user X can only take up 30% of cluster at any point?From the documentation i didn’t find anything in this regard hence i am assuming such a feature may not be present,in that case is it possible to do some workaround to achieve resource isolation in a rough sense?
  2. Is is possible to apply resource quota limit at storage level? For eg: Can i define max size of a data source beyond which it is not allowed to grow?
  3. Do organizations use druid in a multi tenant scenario or most of them setup a different cluster for different requirement?If yes,how have they implemented these checks?
  4. Even if these features don’t exist ,are there any discussion around these to implement in future?
    Thanks

Rohit

We use Druid in a multi-tenant environment. In general multi-tenancy is good for query speed because excess capacity is shared. Right NOW all the limitations on cluster usage are done at a level above Druid.

Multi-tenancy on the real-time indexing side is not as good right now because the druid indexer doesn’t have great support for resource-usage awareness per tenant. So if all your tenants use approximately the same resources for real time indexing, then its fine… but if you have a large disparity between tenants, then corner cases where one or the other are performing sub-optimally, or where one or the other on rare occasions have a perfect storm of resource collisions… can happen.

If you’re just talking about QUERY multi-tenancy, then yes, druid does fine in our experience.

And to your second point, our cost accounting is related to data size, so if tenants want to send us more data that is good and we will happily grow to whatever size is needed to fit their requirements.

Note that you CAN set up multiple historical tiers to help control cost at the expense of more relaxed speed expectations. Populating data in different tiers is enforced on two criteria: data source, and age of data. For us different tiers revolve around the amount of cpu and memory is available per on-disk byte of queryable data. Where the more performant tiers have more memory and cpu available over less data. So you can say “All data from the last 5 weeks must be really fast, and data up to 13 months must be fast, and data up to 3 years must be queryable” See http://www.slideshare.net/CharlesAllen9/programmatic-bidding-data-streams-druid#34 for a screenshot of our metrics cluster (our cluster for monitoring our cluster).

There are resources currently in play to have better access rights in place: https://github.com/druid-io/druid/issues/2355 but I haven’t seen the ask for more limits natively supported by druid. Would you mind filing some github issues at https://github.com/druid-io/druid/issues , preferably one issue per “thing you would like druid to do”.

Charles,

Thanks for a detailed response.
My major concerns are around QUERY multi-tenancy as of now,but would definitely need isolations on the realtime ingestion later.

You mentioned that "all the limitations on cluster usage are done at a level above Druid”,so these query limitations are on number of queries per user or at total query cost per user?

If we implement it just on number of query per user than the users firing complex queries running over larger time interval will benefit over users firing smaller queries hence i was thinking it would be good if a cost can be assigned to each query.

Even my plan B is to implement it in a layer above druid but if it is a generic requirement of druid users we can work together to add these features to native druid,probably as separate component or by enhancing broker.

Thanks

Rohit

Internally we rely on the middle-tier logic to be nice regarding how it queries the data.

It is worth noting that druid does support the concept of query priority, where the results for a higher priority query are processed first. (some of the very large asynchronously processed and low priority reports can take many minutes to finish)

Auto-detection of query complexity is not something that druid natively supports, so tuning of priority and how many concurrent queries are “appropriate” is still very dependent on the querying layer to figure out what works best for that cluster and set of use cases.