Kafka indexing: Protobuf with extension

Hey Guys,

We are quite new to druid and currently evaluating druid vs our old influxdb cluster.

For druid we are ingesting data from kafka using imply’s kafka indexing service. (https://imply.io/docs/latest/tutorial-kafka-indexing-service.html). This works fine for simple JSON data, however most of our data in Kafka are in binary protobuf format and these protobuf data are quite complex with extensions (https://developers.google.com/protocol-buffers/docs/proto#extensions)

I unsuccessfully tried looking around to check if there is a way to enable kafka indexing service to support this kind of data, we wanted to quickly reach out to community to check if anybody has try doing this, if so share their approach to configure druid to index this kind of data.

-Suhas

Hey Suhas,

Druid does support protobuf data through the “protobuf” parser. It appears to be undocumented (not sure why) and to only support flat formats. This is the code that can parse protobuf messages: https://github.com/druid-io/druid/blob/master/processing/src/main/java/io/druid/data/input/ProtoBufInputRowParser.java (check out the method “buildStringKeyMap”).

There was some other work here in the past.

https://github.com/druid-io/druid/pull/2354

https://github.com/druid-io/druid/pull/3509

https://github.com/druid-io/druid/issues/3505

https://github.com/druid-io/druid/pull/3508

I’m not sure if knoguchi is still working on a protobuf3 extension but that seems to be the direction things were going.

Thanks for the reply Gian. I am aware that druid supports protobuf data, the problem as you mentioned is it only supports simple formats, most of the enterprise have complex format that involves extensions.

Thanks for sharing the other work, looks like none of time provide solutions to enterprise format :frowning:

-Suhas

I will resume the protobuf extensions work shortly.

Kenji Noguchi

I created https://github.com/druid-io/druid/pull/4039
Please let me know if this satisfies your use case.

The PB “extension” should work as long as the Google’s protobuf-java-util package can convert it to nested JSON.

-kenji