Nested json input processsing

Is it possible to process the two differently structured nested json data and save them both in a single datasource in Druid? That is, I’ve a dataset of api request and response(both different structured) published into a kafka topic and I’m trying to ingest that data into Druid data source.

Is it possible to do that? Please help.

According to this documentation: Schema design tips · Apache Druid Druid does not support nested dimensions. Nested dimensions need to be flattened. As both the request and response is available in the same kafka topic, they can be ingested into the same Druid data source once the nested dimensions are flattened out. Dimensions from multiple kafka topics can also be combined to create a new single kafka topic. This can then be ingested to create the druid datasource.

The json parser during ingestion can flatten a nested json structure, so if your messages are like:

{"timestamp":"2018-01-01T07:01:35Z","json1":{"name":"octopus","color":"yellow"},  "json2":{"location":1, "number":100}}

then you can ingest it into a single druid data source with a flattenSpec such as:

"inputFormat": {
        "type": "json",
        "flattenSpec": {
          "fields": [
            {
              "name": "json1.color",
              "type": "path",
              "expr": "$.json1.color"
            },
            {
              "name": "json1.name",
              "type": "path",
              "expr": "$.json1.name"
            },
            {
              "name": "json2.location",
              "type": "path",
              "expr": "$.json2.location"
            },
            {
              "name": "json2.number",
              "type": "path",
              "expr": "$.json2.number"
            }
          ]
        }

@Sergio_Ferragut Thank you for your answer.
the input is coming from kafka stream and dataset contains api request and response(separate json objects) with no delimiter in between, The request and response has two different structure. I’m would want to consider one api request and response as one event store in a datasource,

{
field1: {
"a1": "value1",
"b1": "value2"
}
}

{
field2: {
"a2": {“x”:“y”},
"b2": “value3””
}
}

{

Api request for second event

}

{

Api response for the second event.

}

@sandeepmuthathi I don’t think there is an out-of-the-box solution for that. I think your best bet may be to preprocess the messages converting from:

{<request JSON>} {<response JSON>}

into a single JSON object such as:

{ "request":{<request JSON>},"response":{<response JSON>} }

You can then use the flattenSpec to read the elements of the request and response fields into individual columns at ingestion time.

Hey @sandeepmuthathi,
Hope all is well. In Apache Druid 24.0, you can now ingest JSON objects directly and still benefit from the automatic indexing of the nested fields. Checkout the Nested Columns section in the docs for details on this.

1 Like