Realtime ingestion

I’m just getting started setting up Druid and the standard install was fairly easy but I’m having some trouble sorting out setting up realtime ingestion. I’ve been reading over the documentation but I’m still finding some aspects confusing and I was hoping someone could help me out. It seems like some of the information might be old and from a previous way of doing things or maybe I don’t quite understand what’s required.

So there’s Tranquility but I’m not clear what the status is for that. It doesn’t appear to have had a release since June, 2016. There’s a branch for 0.10 but there doesn’t seem to have been much work on that.

Alternatively the is the existing indexing service (I’ve found it somewhat confusing trying to sort out services and nodes. What nodes provide what service and what services are even available. The documentation seems to mention them without any section explaining what they are), or stand alone realtime nodes. The documentation only mentions some limitations associated with standalone realtime nodes. I’m assuming that there must be some advantage to choosing standalone realtime nodes over the existing indexing service. Is it because it scales better?

I was a little confused that there are startup scripts for all the other node types but none for a realtime node. No biggie but the only thing that the documentation gives for running them is “Running: io.druid.cli.Main server realtime”. It’s not a big deal to get that runnning similar to the other nodes but I just wanted to make sure that’s the right thing to do. I don’t quite understand the Firehose/plumber. Is that something specific to realtime nodes or does that apply to the existing indexing service as well?

I’m trying to get a kafka 0.8 consumer with the kafka-eight extension but the documntation seems to only include a portion of the spec file under “firehost” but I’ve seen some other examples that seem to use kafka-eight that don’t use that.

Any feedback would be greatly appriciated.

So there’s Tranquility but I’m not clear what the status is for that. It doesn’t appear to have had a release since June, 2016. There’s a branch for 0.10 but there doesn’t seem to have been much work on that.

The documentation only mentions some limitations associated with standalone realtime nodes. I’m assuming that there must be some advantage to choosing standalone realtime nodes over the existing indexing service. Is it because it scales better?

Tranquility and the standalone realtime nodes are older methods of realtime ingestion, development on them isn’t very active currently as you’ve pointed out.

The most actively developed system for realtime ingestion presently is the Kafka indexing service extension:

http://druid.io/docs/latest/development/extensions-core/kafka-ingestion.html

I would recommend trying that out if possible.

Thanks,

Jon