Securing a Druid cluster

Hi all,
I am evaluating Druid for large scale OLAP with additional tools like Caravel or Metabase but I can’t find any way to secure the cluster. I can imagine isolating the cluster with firewalls and relying on Caravel for authentication and permissions but my problem is on data ingestion.

I am currently using Spark and Kafka but none of them will allow to implement a secure data ingestion. Kafka connector doesn’t manage Kafka secured API and Spark BeamRDD is pushed based approach so I need to allow access from Spark worker which is insecure for multi tenant Spark clusters.

How do you deploy Druid and use it in a secure way ?

As you said , the common way i have seen is to have a web service/App layer on front of druid that handle the security from the query side.

From the ingestion side Druid works well with Kerberos in the case of batch ingestion. But for realtime ingestion it is more trick. I have seen some use cases with Storm where you will open some ACLs based on a port range that druid workers listen to.

Hope this is helpful

Hey Vincent,

Like Slim said, for the query side, the common solution is firewalls + authorization at some service layer in front of Druid.

For ingestion, the best path is probably using a secured Kafka cluster + the new Kafka indexing service (http://druid.io/docs/latest/development/extensions-core/kafka-ingestion.html). That extension uses the Kafka 0.9 client, which should be new enough to work with a secured Kafka cluster if you set the appropriate client properties. But I haven’t actually tried it to make sure it works; if you have any issues getting that to work, could you please post about them?

and what about the REST API securization to provide authentication and permissions ? I saw something about an experimental feature…

here is an explanation on how to make use of that feature

https://groups.google.com/forum/#!topic/druid-user/23nVku3G4Rw/discussion