Some Confusion About Druid streams, firehoses, and Django Web Framework

Hello.

I am looking to ingest analytics data into a Druid cluster from a website powered by the DjanGo web framework, but I’m a little confused as to how Druid streams and Firehoses work. Is a stream some sort of officially recognized data type outside of the Druid application, or is it technical jargon limited to the Druid.io data store? Same thing with a Firehose. Are Firehoses some sort of specific data type or structure or object that has an official meaning within the computer science community? Googling “streams” and “firehoses” has, probably unsurprisingly, been less than helpful.

If anyone is familiar with the Django Web Framework, I was planning on creating a custom middleware within the Django stack, which would transparently take the incoming request, create a Django backed database object, serialize that object into json, and then push that data to Druid in any of a number of ways. I want the middleware to operate in its own execution context, such that there is absolutely no delay as far as the Django powered website is concerned, as the middleware is only concerned about the http headers coming from the client, and should push the data without needing any sort of confirmation, etc. If there is a way to push the data from inside the middleware’s executing python code, then that would be optimal. Otherwise, I could write the data to a file and setup a firehose to ingest the data using the directory where I store those jsonified objects.

But here’s the thing: Is this the correct way of doing it? What exactly is a stream and a firehose? There is some mention of something called Tranquility and Kafka over in the site documentation for druid.io, but I have no idea what those two things are, and their own respective documentations are more confusing than anything else. Is a stream just the process of posting the data using a curl request? I find it hard to believe that such a sophisticated system would be built completely on top of the curl application. Is there some sort of documentation for just what streams are, how to create them outside of a druid.io setting? What about firehoses?

In my mind, it seems really easy. But then there’s this whole layer of complex documentation surrounding the concepts of streams and firehoses, and I don’t understand any of it, or what it’s all for. Can somebody please explain this to me like I’m 5?

Thank you very much for your time.