Trouble getting started with Tranquility + Samza

Hi! I’m looking to setup a small Druid cluster for evaluation in my office, and I’m a little stuck on how best to stream data into Druid via the Indexing Service.

We’re pretty invested in Samza for dealing with all of our streaming data from Kafka, but I’ll admit right away that I’m not terribly familiar with Samza’s paradigm - and that may very well be why I’m confused about Tranquility! :slight_smile:

I’m getting lost trying to wrap my head around the example Samza task shown in the Github readme (https://github.com/druid-io/tranquility/blob/master/README.md).

Is this code example something that could (with an appropriate data set) be run all by itself as an independent Samza stream task? Or am I supposed to implement MyBeamFactory alongside my stream task and feed data to this class? Or does it expect to be run apart from Samza and fed data from the stream task while running as its own process?

My apologies if these are noob-ish questions, but as an Ops guy my dev chops are a bit weak!

FWIW, I would just try to dump the example code into a Samza task and run it to see what it does, but our Samza environment is entirely Clojure-based… so getting it ported will take me a while, and it will be easier on me if I understand where Tranquility fits into the puzzle so that I can build my task up from scratch. :slight_smile:

Thanks for your time!

Hey Cody,

The idea is that tranquility + your BeamFactory can act as a SystemProducer for Samza. If you root the properties under “samza.druid” (e.g. systems.druid.beam.factory) then it will be activated for any outgoing messages you send with the system “druid”. The stream should be the name of the Druid datasource you want to write to.

So what you can do to enable that is:

  1. Write a StreamTask that produces the kind of messages you want to write.

  2. Write a BeamFactory that builds a beam stack that can forward those messages to Druid. It needs to know how to extract a timestamp, serialize the messages to JSON (it uses Jackson by default, but you can override that), and create the appropriate Druid schema with dimensions, aggregators, etc.

  3. Set systems.druid.beam.factory to the BeamFactory class you wrote, and set systems.druid.samza.factory to “com.metamx.tranquility.samza.BeamSystemFactory”.