Issue with Filter groupBy Query Results

Hi Team,

I am trying to run a query with multiple filter that includes filter on a dimension column that has timestamp value. The query works fine when I add the filter type “interval” in groupBy query filter. However, it ignores the same filter if I apply using the havingSpec.

The concern is the query without interval filter runs in 600ms whereas with interval filter, it is taking ~7 seconds. I can apply the same filter on resultant data which would be much smaller and hopefully improve the query performance.

Is there any limitation on what filter we can use in havingSpec. I am using Druid 0.10.1 version and the doc states it supports any druid query filter.

Would appreciate your input here or any suggestion on how to tune this query further.

Here are queries:

This queries works fine with interval filter.

{

“queryType”:
“groupBy”,

“dataSource”: “testDataSource”,

“descending”:
“false”,

“granularity”:
“all”,

“dimensions” : [{

    "type": "extraction",

     
    "dimension": "sta",

     
    "outputName": "sta_interval",

     
    "extractionFn": { "type":

“timeFormat”, “format”: “yyyy-MM-dd HH:mm”,
“granularity”: “fifteen_minute” }

    }],

“aggregations” : [

      {  "type" :

“longSum”, “name” : “packageCount”,
“fieldName”: “package_exists” }

     ],

“filter”: {
“type”: “and”, “fields”: [ { “field”:
{“type”: “selector”, “dimension”:
“latest_rml”,“value”: “DELH” }, “type”:
“not”},

     { "type": "selector",

“dimension”: “next_rml_cd”,“value”: “DELH”
},

     { "type": "in",

“dimension”: “outbound_l4cons_type_cd”,“values”:
[“Flight”, “Truck”, null] },

     { "type": "in",

“dimension”: “enroute_status”,“values”: [“Import”,
“Export”, “Intransit”, “Domestic”] },

     { "type": "interval",

“dimension”: “sta”, “extractionFn”: {
“type”: “timeFormat”, “asMillis”: true},

        "intervals" :

[“2018-04-27T18:00:00.000Z/2018-04-27T20:00:00.000Z”]

     }

     
                 
  ]

},

“intervals”: [

“2018-04-26T00:00:00/2018-04-28T00:00:00”

]

}

This query doesn’t apply the filter correctly in HavingSpec. It is returning all the results.

{

“queryType”:
“groupBy”,

“dataSource”: “testDataSource”,

“descending”:
“false”,

“granularity”:
“all”,

“dimensions” : [{

    "type": "extraction",

     
    "dimension": "sta",

     
    "outputName": "sta_interval",

     
    "extractionFn": { "type":

“timeFormat”, “granularity”: “fifteen_minute” }

    }],

“aggregations” : [

      {  "type" :

“longSum”, “name” : “packageCount”,
“fieldName”: “package_exists” }

     ],

“filter”: {
“type”: “and”, “fields”: [ { “field”:
{“type”: “selector”, “dimension”: “latest_rml”,“value”:
“DELH” }, “type”: “not”},

     { "type": "selector",

“dimension”: “next_rml_cd”,“value”: “DELH”
},

     { "type": "in",

“dimension”: “outbound_l4_cons_type_cd”,“values”:
[“Flight”, “Truck”, null] },

     { "type": "in",

“dimension”: “enroute_status”,“values”: [“Import”,
“Export”, “Intransit”, “Domestic”] }

  ]

},

“intervals”: [

“2018-04-26T00:00:00/2018-04-28T00:00:00”

],

“havingSpec”: {
“type”: “filter”,

     "filter": { "type":

“interval”, “dimension”: “sta_interval”,
“extractionFn”: { “type”: “timeFormat”, “asMillis”:
true},

        "intervals" :

[“2018-04-27T18:00:00.000Z/2018-04-27T19:00:00.000Z”]

               }

     
   }

}

How is the dimension “sta” defined in your ingestion spec?

Hi Jonathan,

sta dimension is defined as string, however the values are ISO8601 timestamp.

Ah, I think I see the issue with your second query:

“havingSpec” in the query should be “having” instead, can you try that?

Thanks Jonathan. That fixed it. Seems like docs need be updated as it says “havingSpecs” instead of “having”

Thanks,

Kashif