Is 0 filling a groupBy query possible?

Hello,

I understand that timeseries queries will 0 fill by default and that groupBy queries don’t.

Is there a method of 0 filling groupBy queries at all?

These particular queries don’t fit into the timeseries model that I can see.

Example:
{

“queryType”: “groupBy”,

“dataSource”: {

“type”: “table”,

“name”: “ds”

},

“dimensions”: [

{

“type”: “listFiltered”,

“delegate”: {

“type”: “default”,

“dimension”: “id”

},

“values”: [

“myId”

],

“isWhitelist”: true

},

{

“type”: “listFiltered”,

“delegate”: {

“type”: “default”,

“dimension”: “currency”

},

“values”: [

“GBP”

],

“isWhitelist”: true

}

],

“granularity”: “hour”,

“filter”: {

“type”: “and”,

“fields”: [

{

“type”: “selector”,

“dimension”: “id”,

“value”: “myId”

},

{

“type”: “selector”,

“dimension”: “currency”,

“value”: “GBP”

}

]

},

“aggregations”: [

{

“type”: “doubleSum”,

“name”: “value”,

“fieldName”: “value”

}

],

“intervals”: [

“2018-08-01T05:00:00Z/2018-08-22T19:00:00Z”

]

}

``

And we also have nested versions of the above with post aggregations.

So would there be a way of 0 filling the periods where there is no data?

Thanks,

Dyana

Hi Dyana,

I think the easiest way is to use the CASE-WHEN function of SQL (http://druid.io/docs/latest/querying/sql#other-functions).

If you want to use groupBy, I think you can use the expression virtualColumn (http://druid.io/docs/latest/querying/virtual-columns.html#expression-virtual-column).

Since Druid currently treats null values as empty strings, I think the below virtual column would work.

{

“type”: “expression”,

“name”: ,

“expression”: “if(currency == ‘’, 0, currency)”

}

To use this in the groupBy query, you can just add ‘virtualColumns’ field like below.

{

“queryType”: “groupBy”,

“virtualColumns”: [

“virtualColumn”: {

“type”: “expression”,

“name”: ,

“expression”: “if(currency == ‘’, 0, currency)”

}

]

}

For more supported expressions, please check http://druid.io/docs/latest/misc/math-expr.html.

Please note that the expression system is not much optimized and maybe the query performance is not good as you expect.

Best,

Jihoon

Hi,

Thanks. I’ve been testing with the virtual columns and expressions to see if it will work in all cases.

Question though, the fields that you can use in a virtual column expression, are these only the fields declared as dimensionSpecs in the datasource, or can you also use metricSpec fields?

Perhaps I’ve gotten some syntax wrong, but a metricSpec field in an expression seems to always be null for me.

Dyana

Hi Dyana,

I believe Jihoon’s suggestion would work if you are getting ‘null’ back for certain dimensions or metrics in the groupBy result, but not if you really have no data at all. If there is no data for a particular hour then groupBy will not emit any data for that hour.

aye, “skipping” ranges where there is literally no segment is what I expected after tracing through the relevant Druid.io code.

However for 0-filling where the segment exists in cases where the groupby is needed (a nested groupby) removing the list filtered dimension (but keeping the filter) and using a virtual column has enabled it to 0 fill and not return multiple results for a given time period.

Thanks,
Dyana