Confused between Protobuf/Avro/JSON for Kafka Indexing Service Event Ingestion in Druid


We have a use case where we are transitioning from Batch + Realtime (Tanquility) paradigm to Windowless Kafka Indexing Service Ingestion of Our Event Stream.

In previous approach, we used to store event data as JSON objects into Druid (Leveraging Druid JSON Efficient Handling) and in Kafka we use to store it in protobuf format.

But now for Ingesting Events into Kafka (While will be consumed then by Kafka Indexer in Druid), we have three Options :

  1. Use JSON

  2. Use Avro

  3. Use Protobuf

Could anyone please list down the pros and cons for the same, I can not happen to figure out good docs or comparision online ?

We are having following numbers and configuration to deal with :

Current Load :

  • Average events per day :
    2.78 million per day.

  • 250 bytes of protobuf message --> Average Size of a single event

However , we are targetting 100 million events in an hour.

Our Segment Size is 1 hour time interval.


Pravesh Gupta