User Authorization and Protecting Data stored in Druid

We have a need to protect the data that is stored within Druid on a per-user basis. I know that it is possible to do User Authentication via Servlet Filters, but we need to go a step beyond and provide User Authorization as well. The data we are ingesting will be tagged with different permissions / access levels and our users are each granted different permissions / access levels that we need to use to show users only the data they are allowed to see. This is similar to something like Oracle’s Label Security or providing row/cell/document/column level security on the data. As far as I know, there is not a built in way to accomplish this within druid. Does anyone have ideas of suggestions on how to accomplish this within druid? Without this, I’m not sure we will be able to use druid any more as our analytics store, so I’m really hoping we can come up with some kind of solution!

Thanks

Andrew

HI, One possible option can be to write a proxy web-service that does necessary authentication/authorization. You can hide druid nodes behind this web-service.

– Himanshu

So you mean iterate over all the results before returning them though the proxy? That could be quite slow. Also, how do you project back the access level dimensions of the data?

Andrew

I haven’t really thought all the way through but I meant is that you could apply restrictions based on the datasource and columns (or whatever else you could see from the request[query).]

On re-reading your original question, now I understand that you really want to grant access based on data and not datasource/columns. For that, yes, you will need to read through the rows returned by druid. It will be slower, but not a show stopper if you iterate the rows as they are returned from druid in a stream fashion instead of building a full json in memory and iterating the rows from there. For example, druid broker does something similar for reading/processing the json response it collects from historical.

Please know that, this is something that jumped into my mind and may not necessarily be best approach. I don’t know if there is a simpler solution for now.

– Himanshu