Protobuf batch ingestion

Good day,

Is it possible to batch ingest a list of serialised protobuf messages saved to a file? According to the documentation a “Protobuf Parser” exists, it requires the type definition to be “protobuf” and the parseSpec to be a JSON Object of type timeAndDims. Does this then mean that the serialised protobuf message should be included into a JSON wrapper where the time is explicitely specified in JSON or should the timestamp be included in the protobuf message?

My current attempt does not seem to work.
Ingestion spec:

{
“type”: “index_hadoop”,
“spec”: {
“dataSchema”: {
“dataSource”: “proto”,
“parser”: {
“type”: “protobuf”,
“descriptor”:“description.desc”,
“parseSpec”: {
“format”: “timeAndDims”,
“timestampSpec”: {
“column”: “timestamp”,
“format”: “auto”
},
“dimensionsSpec”: {
“dimensions”: [
“userId”
],
“dimensionExclusions”: [
“eventId”,
“url”
]
}
}
},
“metricsSpec”: [
{
“type”: “count”,
“name”: “count”
}
],
“granularitySpec”: {
“type”: “uniform”,
“segmentGranularity”: “day”,
“queryGranularity”: “none”,
“intervals”: [
“2015-03-01/2015-04-01”
]
}
},
“ioConfig”: {
“type”: “hadoop”,
“inputSpec”: {
“type”: “static”,
“paths”: “/var/data/binary”
}
}
}
}

where “/var/data/binary” is the file containing serialised protobuf messages. One message per line.

And the proto file, containing the timestamp as a string:
package tutorial;
message Person {
required string eventId = 1;
required string timestamp = 2;
required string userId = 3;
required string url = 4;

}

Could you please provide assistance or a working example on how to batch ingest a file containing serialised protobuf messages.

Hi Louw, the protobuf code was community contributed and should really be in its own module. Have you resolved this?