Search query slow with filter

Hello!

The following unfiltered search query takes ~2 seconds:

{

“queryType”: “search”,

“dataSource”: “DATASOURCE”,

“searchDimensions”: [

“DIMENSION”

],

“query”: {

“type”: “insensitive_contains”,

“value”: “foo”

},

“granularity”: “all”,

“intervals”: [“2016-11-12T05:00:00+00:00/2016-12-12T13:00:00+00:00”]

}

Now I add this filter:

“filter”: {

“type”: “selector”,

“dimension”: “DIMENSION2”,

“value”: “BAR”

}

Running the same query with this filter takes ~22 seconds.

Some info:

The cardinality of “DIMENSION” is somewhere between 100.000 and 1.000.000, and “DIMENSION2” is 15-20.

We are running Druid 0.9.1. DATASOURCE contains around 41GB of data over one month, with rollup hour and Concise bitmaps. Our timeseries and topN queries with filters are not experiencing the same drastic difference in speed.

Is this behavior expected? Is there anything we can do to speed up the filtered query? Let me know if there is any other information I can provide to help narrow down the issue.

Best,

John

I think the issue here is that a filtered search query uses an index-only approach involving a bitmap intersection for each search dimension value that matches the “query”. The idea is this should be faster than scanning through the rows that match the filter and picking up all dimension values for those rows. But if you have a lot of values that match the query, or if the filter is very selective, or both, then this assumption could be wrong. Ideally the search query should use some heuristics to choose between the index-only vs. cursor-based algorithms, or at least provide a context flag to let you choose which one gets used.

I raised https://github.com/druid-io/druid/issues/3775 about this.

Thank you for your reply Gian, much appreciated! I will follow the Github issue for updates.

Best,

John