[druid-user] Re: ProtoBuf Parser

Thanks Himanshu

To confirm, my “parser” setup in the spec file that gave the exceptions in my previous post is:

“parser” : {

“type” : “protobuf”,

"descriptor" : “MetricsRecord.proto”,

“parseSpec” : {

“format” : “json”,

“timestampSpec” : {

“column” : “timestamp”,

“format” : “millis”

},

“dimensionsSpec” : {

“dimensions”: [“userId”,“sourceId”,“deviceId”],

“dimensionExclusions” : ,

“spatialDimensions” :

}

}

}

I’ll doublecheck that “MetricsRecord.proto” is in the classpath, it should be.

I’ll also see what I can find in the code, thanks for the references.

After some investigation, i found following solution!

First, the type inside parser must be ‘protobuf’ as allready said. Setting it to ‘protoBuf’ will lead to a wrong parser.

Then you have to add a parameter ‘descriptor’, which contains the description file of protocol buffers. This is not the .proto file - you have to create it with the protoc compiler using

protoc --descriptor_set_out=mymessages.desc mymessages.proto

Then move this .desc file to you path, containing your realtime spec file and set

“parser” : {

“type”: “protobuf”,

“descriptor”: “mymessages.desc”,

“parseSpec”: {

“format”: “json”,

}

}

Last but not least, you have to add your path, containing the .desc file to your java classpath (and this is important, otherwise you will get a exception from getFile()) with

-classpath /

or add it to existing -classpath with ‘:/’

where and are for example to directories starting from you druid base directory (where you have your config and lib directory, note: don’t add the < or > :wink: )

and after starting your realtime node you should get datas into your database. The configuration will be done like a json object in parseSpec, so be carefull, that you use the same names for your ‘columns’ like in your .proto file (case sensitive)

Hi Matthias,

I added https://github.com/druid-io/druid/pull/1455 to fix the mistake in the docs. Thanks for this catch.

Do you mind contributing some of your findings back to the docs? This will help others in the future. The content/ingestion/index.md doc is a good place to add this information.

Hi Fangjin,

sure, i will add this to the docs in the next days.

Hi,

I wonder what the current status is for ingesting batch data via protobuf.

  1. Does it work?

  2. what is the preferred batch data format? I thought protobuf would be more efficient.

  3. If it works, where should this xxx.proto file to be located?

Thanks,

  • tdingus