Kafka - Spark - Druid Integration

Hi everyone!

I’m a student and I’m exploring the kafka - spark - druid integration, through tranquility.

Before this test, I already tested the kafka - druid integration through tranquility, but I want to add the spark to the architecture and I’m having some difficulties to integrate spark and druid.

Until now, I found some examples of how to do this integration, but i’m dealing with two limitations:

It’s still applicable?

My Druid version is 0.10.1, installed in a hadoop cluster. Besides, I’m developing in java.

Can you please give me some directions?

Thanks in advance,

José Correia

If you want to approach it from java, at a high level you’ll need to construct a Tranquilizer so that there is one available on each executor jvm. This means that you’ll want to broadcast your configuration and have a factory lazy-init the tranquilizer on demand. You will want to replicate, in java, the processing loop at https://github.com/druid-io/tranquility/blob/master/spark/src/main/scala/com/metamx/tranquility/spark/BeamRDD.scala#L42

essentially what is happening is:

  1. for each rdd (spark streaming micro batch is an rdd)
  2. for each rdd partition, execute the callback:
  3. get a handle to a tranquilizer singleton (local to the executor jvm)
  4. loop through row iterator and push data to tranquilizer.

I don’t know how familiar you are with spark, but the code line “rdd.foreachPartition” is executing on the driver, and the line “partitionOfRecords => {” is executing on the worker executors.

Kyle

Hello Kyle!

Thanks for your response! I almost forgot to thank you.

Your response was helpful to define the first steps.

I was succeed integrating kafka - spark - druid in a test cluster.

Now, I’m having trouble to migrate my solution to a kerberized cluster.

I posted the problem here: https://groups.google.com/forum/#!topic/druid-user/l_Ddn9YuSKI

Do you know how to solve this problem?

Best regards,

José Correia

quarta-feira, 7 de Março de 2018 às 17:03:56 UTC, Kyle Boyle escreveu:

Happy to help, glad it worked for you.

Your k8s issue I’m not sure about. It looks like the error is a http 401 on beam creation - how did you set up zookeeper? does it require auth? are you using kerberos? did you set up any druid auth anywhere?

I’ve not worked with any of these platforms in k8s, so hopefully someone with some experience will take a look at your post.

Regards
Kyle