I’m bit confused about Tranquility role in Druid. AFAIK Tranquility act as a consumer for any Druid data sources (Kafka in my case), and create a Peon on MiddleManager. The Peon tasks in middle manager answer the real-time queries from Broker or is Tranquility buffer events and answer the queries?
The follow-up question is what the recommended instance type and configuration for Tranquility in a typical production deployment is?
It would be great if Druid team can set up a Slack team for Druid. I noticed that the confluent have a Slack team for Druid where they have separate channels for Ops, Streams, Kafka Connect to share collective knowledge.
Tranquility Core is basically a client side library similar to a Kafka Producer. It is aware of the layout of your Druid cluster and it knows where to send data in order for it to be indexed in real time. We have some pre-built modules for integrating it with Samza, Spark, Flink, HTTP, etc, or you could use the Core library directly.
The main two options for realtime ingestion in Druid today are Tranquility and also the Kafka Indexing Service (http://druid.io/docs/latest/development/extensions-core/kafka-ingestion.html). The latter is quite different from Tranquility. It only works with Kafka, but doesn’t require any external processes, and is able to guarantee exactly once processing.
There is an official Druid IRC chat, but I’m not sure how much traffic it gets these days (I have to admit I rarely check it, and usually check the mailing lists for discussions instead).
Thanks Gian, Appreciate it.
On Slack Community Channel: One of the common feedback I got from the people who adopted Druid or tried but reverted the adaptation of Druid is, “Druid complicated to operate; It has too many demons.” I noticed most of the tech talk available on either Youtube or meetups almost are focused on Druid use cases rather than Druid internal Deep dive. Any steps towards address this issue would be awesome.
The community Druid distribution is really set up for maximum flexibility and the daemons are part of that. The idea is that the specialization can help you tune, scale, and debug each part independently, which can be helpful especially in large elastically scaled cloud environments. I will grant that it is a bit overkill for a simpler setup though. For that you could try the Imply distribution which is easier to get started with: https://imply.io/get-started (note also that Imply supports complex setups too, it’s just that the “default” is simple)