Connect real-time application to Druid through Kafka

There’s a real-time application that constantly generates data, a copy of which is forked and sent to a server where we wish to perform analytics using Druid. We wish to use Kafka indexing service over Tranquility, as the windowPeriod interferes with ingestion capabilities. But we noticed Kafka producer can be invoked on the command-line though a bash script and takes input on the command-line (or through a file). The problem is how do we send real-time data to Kafka like this every time an event is produced? Isn’t there a Tranquility type way to create an endpoint on a port on the server, so we can send the events into it? We don’t wish to install Kafka in all the servers we want to send the data from.

I mean, if I could send events to Kafka programmatically, that would be great. Is there an alternative way to use the client? More like an API?

Hey Suhas,

There’s lots of examples online that show using a Kafka producer to write events from your application. E.G. https://dzone.com/articles/kafka-producer-in-java

For posting to Kafka via HTTP there’s a few projects that might be helpful but I can’t speak of their quality:

https://docs.confluent.io/1.0/kafka-rest/docs/intro.html

Kind regards,

Dylan

Hey Dylan,

Thanks for replying. I found a Python wrapper for Kafka Producer. Using that I sent a newline separated json as a single string like you’d do with batch ingestion.

Like, here are 3 events:

‘{“data1”:“data”}\n{“data2”:“data”}\n{“data3”:“data”}\n’

``

The Kafka Consumer receives the entire string unmodified. But it looks like Druid ingests only the first json event. This means I need to use separate send calls for each event. Even though I’m comfortable with this, any reason why this happens?