Kafka Firehose vs Tranquility

I am trying to ingest data from Kafka into Druid. I am currently using Kafka Firehose API.
I also read that Tranquility “handles partitioning, replication, service discovery, and schema rollover for you, seamlessly and without downtime”.

  1. What is the best way to ingest Realtime ( I am finding things schema change hard to manage using Firehose API)?

On the page https://github.com/druid-io/tranquility, it is assumed that there is Samza, Storm etc. built on top of data.

  1. Can Tranquility be used directly on top of Kafka and is it advisable to do so?

Hey Saksham,

The two major ways of doing realtime ingestion right now are the Kafka firehose + realtime nodes, or tranquility + the indexing service. IMO the Kafka method is a bit simpler to get started with, but the tranquility method is more flexible.

The main advantages of tranquility are that it can do replication at any scale (the Kafka firehose cannot do partitioned replication) and that it manages schema changes for you (dimension/metric/queryGranularity). The main advantage of the Kafka firehose method is that you can restart your realtime nodes whenever you want and they will resume where they left off. Tranquility + the indexing service does have a mechanism for doing rolling updates but it is a little more involved than “restart whenever you want”.

It’s totally doable to have a process that simply reads from Kafka and then turns around and writes to Druid using tranquility. You could write this using tranquility’s direct api, or you could use a very simple Storm topology or Samza job.

This might also be interesting:
https://groups.google.com/forum/#!searchin/druid-development/fangjin$20yang$20"thoughts"/druid-development/aRMmNHQGdhI/muBGl0Xi_wgJ