Custom parsing while ingesting from kinesis stream to druid


I am trying to ingest snowplow data from kinesis stream to druid. I am using extension for kinesis(given by druid). But, unfortunately, the data from stream is not in any of the druid parse format(csv/tsv/json). So, I need to parse the data from kinesis stream and put data to druid.

Sample snowplow data -

So, right now, what I am thinking is - updating the function:getRecordRunnable in and write a parser in between pulling the data and pushing to OrderedPartitionableRecord. Is it a right way to approach ? or please suggest the right way to proceed.

Open for feedback/suggestions.



Hi Paras,

There are two options

  1. Have a ETL process (stream process - this is outside of druid) and let it parse to JSON or CSV into another topic. Let druid task read from this new topic

  2. You can write a custom druid extension that does the parsing. Please refer to the below link especially the ExampleByteBufferInputRowParser for more information.


Muthu Lalapet.