Handling/Using NaN

Hi all,

I am ingesting row based data for which the value of any given column (string or numeric) may be null, or NaN.

Each row is time stamped, with multiple columns. I have druid set up to take streaming data from a Kafka topic.

In druid, I’d like to be able to store string or numeric dimensions that are occasionally null/Nan, but so far druid gives me errors when I try to insert such items from Kafka.

Is there an typical/suggested aproach to working with null/NaN?


There are two ways to handle this depending on your requirements

You can filter out the entire row when a colum to be aggregated has null/Nan values, Please refer to the transform specs on how to filter those rows - https://druid.apache.org/docs/latest/ingestion/transform-spec.html#filtering

If you still need the row but want to skip aggregation for null/Nan values, please refer to the filtered aggregator - https://druid.apache.org/docs/latest/querying/aggregations#filtered-aggregator