We have a use case where we are transitioning from Batch + Realtime (Tanquility) paradigm to Windowless Kafka Indexing Service Ingestion of Our Event Stream.
In previous approach, we used to store event data as JSON objects into Druid (Leveraging Druid JSON Efficient Handling) and in Kafka we use to store it in protobuf format.
But now for Ingesting Events into Kafka (While will be consumed then by Kafka Indexer in Druid), we have three Options :
Could anyone please list down the pros and cons for the same, I can not happen to figure out good docs or comparision online ?
We are having following numbers and configuration to deal with :
Current Load :
Average events per day :
2.78 million per day.
250 bytes of protobuf message --> Average Size of a single event
However , we are targetting 100 million events in an hour.
Our Segment Size is 1 hour time interval.