Clarification needed on realtime indexing with the EventReceiverFirehose

If I want to push new events into Druid using the EventReceiverFirehose, what does that process look like?

From what I gather, you create a realtime index task by communicating with an overlord node. After that, what do you do? When and to which node do you send the actual event data? Is the realtime node necessary in this situation or is the indexing service/overlord sufficient? I suppose I could also rephrase this question to ask what exactly Tranquility does.

Thanks in advance for any insight here.

Hi Letat,

You’re correct in your understanding of how realtime indexing tasks work. After you submit a realtime task to the overlord with an EventReceiverFirehose, the task is distributed to a middle manager with available capacity who then spawns a local Peon process to handle ingestion of the data. Data is provided through an HTTP endpoint on that middle manager node with the form:

http://<peonHost>:<port>/druid/worker/v1/chat/<eventReceiverServiceName>/push-events/
(see EventReceiverFirehose on http://druid.io/docs/latest/ingestion/firehose.html)

In this ingestion flow, the realtime node is not involved - only the indexing service nodes.

Tranquility helps you by handling the complexities of creating these indexing tasks to generate segments for the right time periods and pushing data to the correct tasks at the correct nodes. It makes it easy to handle ingestion-time partitioning and replication which involve the creation and coordination of multiple indexing tasks. If you are able to use Tranquility in your workflow, it is highly recommended to do so as it will simplify your configuration requirements and minimize ingestion-related issues.