Custom parsing while ingesting from kinesis stream to druid

Hi,

I am trying to ingest snowplow data from kinesis stream to druid. I am using extension for kinesis(given by druid). But, unfortunately, the data from stream is not in any of the druid parse format(csv/tsv/json). So, I need to parse the data from kinesis stream and put data to druid.

Sample snowplow data - https://discourse.snowplowanalytics.com/t/issue-reading-enriched-data-thrift/1384/4

So, right now, what I am thinking is - updating the function:getRecordRunnable in KinesisRecordSupplier.java and write a parser in between pulling the data and pushing to OrderedPartitionableRecord. Is it a right way to approach ? or please suggest the right way to proceed.

Open for feedback/suggestions.

Regards,

Paras

Hi Paras,

There are two options

  1. Have a ETL process (stream process - this is outside of druid) and let it parse to JSON or CSV into another topic. Let druid task read from this new topic

  2. You can write a custom druid extension that does the parsing. Please refer to the below link

https://github.com/implydata/druid-example-extension especially the ExampleByteBufferInputRowParser for more information.

Regards,

Muthu Lalapet.