Correct way to ingest complex data into my Druid system

Hey Druids,

Needed help in ingesting data in a nested format to druid hourly.

The problem is that I have an array of event-value pairs which I would have to transpose as columns i.e. M1, M2 which are keys inside the metrics block will become columns in druid. These messages are written every 15 minutes from kafka to a S3 bucket with a folder structure like :

{druid-bucket}/date=2019-07-19/hr=11/

``

where the messages are clubbed into files and written as avro files. This folder will have all the files for that hour. There will be around 200 million such events hourly in these files.

  1. What will the best way be to ingest this data into druid. Afaik, this kind of flattenning is not supported in druid for avro files.

  2. Also how do I schedule this ingestion to happen every 15 minutes or 1 hour ?

Event structure that is like this :

{
“DIM-Metric”: [
{
“D1”: “Z”,
“D2”: “Z”,
“D3”: “Z”,
“D4”: “Z”,
“Metrics”: [
{
“Key”: “M1”,
“Value”: 12312312
},
{
“Key”: “M2”,
“Value”: 12312312
},
{
“Key”: “M1”,
“Value”: 123178
}
],
“timestamp”: 1546428563000
}
],
“Dimensions”: {
“F”: {
“M”: “5e331130”,
“N”: “DUMMY”,
“O”: “ggggg”,
“P”: “6A31B0A6”,
“Q”: “31485CE5”
}
}
}

``

I would need to create multiple rows from this one event as the Dim-Metric field is an array.