Problem with query lookup application - bug?

I want to map empty strings to “N/A” in query results. However, the results are strangely split between “null” and “N/A”'s (query examples below), as if the mapping
was not being applied in all cases.

When we don’t use the mapping, there is only one category: “null”. The raw data, by the way, only has empty strings for nulls.

Interestingly, it seems like the rows whose “” are not mapped to “N/A” are those that have timestamps that come before the first row with a non-null entry.

This seems like a bug in Druid (tried both 0.8.2 and 0.9.2) — or am I doing something wrong?

{
“queryType”: “groupBy”,
“dataSource”: “my-datasource”,
“granularity”: “all”,
“descending”: “true”,
“dimensions”: [“my-dimension”,
{
“type”: “extraction”,
“dimension”: “mydimension”,
“outputName”: “mappeddimension”,
“extractionFn”: {
“type”: “lookup”,
“retainMissingValue”: true,
“lookup”: {
“type”: “map”,
“map”: {
“”: “N/A”
}
}
}
}
],
“aggregations”: [ {
“name”: “aggregatedmetric”,
“type”: “doubleSum”,
“fieldName”: “my_metric”
}
],
“intervals”: [ “2017-03-10T00:00:00.000/2017-03-10T10:00:00.000” ]
}

``

The empty values are split in the result:

[
{
“version”: “v1”,
“timestamp”: “2017-03-10T00:00:00.000Z”,
“event”: {
“mydimension”: null,
“aggregatedmetric”: 100130,
“mappeddimension”: null
}
},
{
“version”: “v1”,
“timestamp”: “2017-03-10T00:00:00.000Z”,
“event”: {
“mydimension”: null,
“aggregatedmetric”: 46534,
“mappeddimension”: “N/A”
}
},


]

``

Remove the “”: “N/A” map and change it to one that has no effect (“foobar” : “N/A”),
and the result groups all the null values:

[
{
“version”: “v1”,
“timestamp”: “2017-03-10T00:00:00.000Z”,
“event”: {
“mydimension”: null,
“aggregatedmetric”: 146664,
“mappeddimension”: null
}
},

]

``

Note that 100130 + 46534 = 146664``

Thanks,

  • Tomas

I’m willing to bet this is something related to the difference between a column that has some nulls in it and some non-nulls, and a column that doesn’t exist at all.

Could you raise a bug at https://github.com/druid-io/druid/issues please?

Opened https://github.com/druid-io/druid/issues/4301

The schema does not change through the date range, but indeed, there
are some segments that only have "" for that dimension, before the
non-empty values start showing up.

Thanks,

- Tomas

I can confirm this behaviour and also that the nulls are caused by the data that doesn’t have the column in the schema yet.