Renaming and applying transformations at ingestion time

Hi,

We are batch ingesting avro files with druid-avro-extensions.

It would be very interesting for us to rename columns and possibly apply some transformations (concatenating fields, etc.) at ingestion time.

We see in the code that io.druid.query.dimension.DimensionSpec is much richer in functionality than io.druid.data.input.impl.DimensionSpec.

Is it possible to apply some logic, such as simple column name renaming, at ingestion time? In all ingestion spec examples we have seen dimensionSpec.dimensions is always a list of strings (in accordance with code at io.druid.data.input.impl.DimensionSpec)

Thanks in advance.

simplest way would be to customize the parsing by writing your own InputRowParser similar to https://github.com/druid-io/druid/blob/master/extensions-core/avro-extensions/src/main/java/io/druid/data/input/AvroHadoopInputRowParser.java and you can parser avro GenericRecord to InputRow your own ways. You can write it in a separate extension.

You can refer to existing druid extensions or see a sample at https://github.com/himanshug/druid-pwd-provider-extn-sample .

– Himanshu