Ingestion - Filtering on fields not appearing as dimensions


I’ve noticed that in ingestion, when specifying filters section in transformSpec, I can only filter only fields that are in the dimensions list. Even if the field is present in the raw data, if it does not appear in the dimensions list the filter won’t consider that field.

This creates a situation where I need to add a field I do not care about to the Data Source’s dimensions even though I do not care about that field apart from filtering purposes.

How can we work around this?

The transform spec is applied after the initial row parsing (based on the specified dimensions), so I don’t think it’s possible to avoid adding the filter dimension to the dimension list, and there isn’t a workaround for that (aside from reindexing the data to drop the column).

This makes me think it could be useful if Druid provided an option on the transformSpec filters that controls whether the filter applies pre-transform or post-transform.

Currently the filters are applied post-transform only. If pre-transform filters were supported, you could filter on a column and apply a transform to null out the filter column (if a column has all nulls, Druid will skip writing the column when building the final segment).