druid query performance with filters

Hi All,
I’m seeing some interesting behavior on druid query performance based on the dimensional filters that I am applying. Was hoping I might get some insight here. This is the test I ran:

-Queries are made on a druid data source with 22 dimensions.

-I made two timeseries queries with all parameters the same, except for the filters. useCache was also disabled in the druid query.

-Query1 had a filter with 3 values or’d, applied across dimension A. Dimension A has cardinality 4.

-Query2 had a filter with 3 values or’d, applied across dimension B. Dimension B has cardinality 7.

-I ran the druid queries by curling the druid broker, and prepending the curl with “time” to measure its performance.

-Query1 would average around 0.7s real time, and Query2 would average around 0.35s real time.

My understanding up to this point was that each dimension is stored as a column in the segment, and when filtering, an internal bitmap of each column for each possible value is pulled out and or’d (if the value is specified in the filter). From this, I don’t see how there could be such a vast difference (2X) in query performance, simply by filtering on different columns. Can anyone shed some light on this? Is there something else that could affect query performance from the dimensions I am missing?

I’m not sure if I’ve described the problem well enough, please let me know if I can supply any additional information.



Depends on what’s taking up compute time. If you’re doing something like hyperUniques or cardinality query, then these results are reasonable.

For a simple example, if your data is completely random between the 4 and the 7 values, then each of the 4 will have approximately 1/4 of the values associated with it, and each of the 7 will have approximately 1/7th

If you take 3 of the 4 values you (in our tidy random case) have 3/4 of the data

if you take 3 of the 7 values you (in our tidy random case) have 3/7 of the data.

3/7 is pretty close to half of 3/4, so if the dimension values are normally distributed and your speed is limited by aggregate compute time, then you would expect the 3/4 case to take about twice as long as the 3/7 case since there are about twice the values.

Does that case sound anything like the data you have?

*uniformly distributed…

Can you elaborate on what steps occur during aggregate compute time? It sounds like it is proportional to the percentage of values that are being pulled out by each filter.

I ran another experiment based on this assumption. I queried for both dimensions again, and filtered for all 4 and 7 values, respectively. That should mean for both queries, there are the same number of values being evaluated in the aggregate compute time step.

Doing these two queries, I am seeing the query on dimension A (cardinality 4) take about 1.5x as long as the query on dimension B (cardinality 7). So when controlling for the total number of rows to scan/aggregate over, there is still a difference in query time and it seems to be affected by the cardinality of the column. But it looks like the lower cardinality columns take longer, which is the opposite of what I would have expected?

The aggregations I am doing is a longSum on one of the metrics, and I added a count aggregator to ensure I was scanning the same number of rows for each query. I also threw in a third query with no filters to ensure the counts and longSum matched.

I think Charles’s guess was that you are actually scanning a lot more rows with query 2 than with query 1. But if you control for that and scan the same number of rows in both queries, the performance shouldn’t be THAT different…

How long does it take to do an unfiltered query, vs a query with an OR of all the 4 values for dimA, vs a query with an OR of all the 7 values for dimB? The unfiltered query should be the fastest, since the filtered ones are looking at the indexes but not getting any benefit from them.

The difference you see might be due to how long it takes Druid to union the compressed bitmaps.

I ran a large sample for each of the queries. I got on average:

unfiltered: 270 ms

QueryA (4 value cardinality): 683 ms

QueryB (7 value cardinality): 466 ms

It sounds like there’s not much I can do as this gets to the level of the compression algorithm. One additional question: do more layers in the druid query filters drastically affect query performance? I don’t think I’m optimizing as much as I can there to reduce the layers of filters, and I’m wondering if that’s a good avenue to go down to optimize my druid query performance.

What do you mean by layers of filters?

Our druid queries are programmatically generated, so the filters may be generated with extraneous multiple nested and/or layers, which could be reduced in terms of number of layers. For example:
and filter

and filter

and filter

or filter





this could be optimized to just be:

and filter

or filter





Ah I see. An and filter of just one thing shouldn’t add noticeable overhead.

agree that it will not be an overhead. but removed unnecessary and/or filter in https://github.com/druid-io/druid/pull/2704.

2016년 3월 23일 수요일 오전 1시 30분 9초 UTC+9, Gian Merlino 님의 말:

Have you tinkered with roaring vs concise bitmap compression by chance?

Hey guys,

very interesting conversation!

I’m having the same behaviour in our cluster. If I query one month worth of data with no filter applied, a topn query with one split would take around 30 seconds to complete. If I apply a filter on a single value of a single dimension, then only a fraction of the rows are matched and Druid needs to sum up fewer metric values and the query comes back in 5 seconds. So our cluster also seems to be extremely bound by this “aggregate compute time”.

It seems logical for me that summing up a lesser amount of values takes a lesser amount of time, but the question indeed is if this is really a normal or abnormal behaviour.

The segment scan times are also directly proportional to which fraction of records survive a filter. Without a filter applied and aggregating over 4 simple metrics I see segment scan times of 2 seconds (r3.8xarge nodes). Even for a single metric the scan time seems brutally high. Only if I apply a filter that lets only a fraction of the rows “survive” (lets say filter on one value of a dimension with cardinality 200) do I see segment scan times of 100 ms.

To my understanding, the segment scan time is per segment and per CPU core, so given the same underlying hardware (e.g. r3.8xlarge node) and given the same number of rows per segment (lets say 5 million) and given the same aggregation (lets say longSum), all people should be seeing the same segment scan times regardless of their datasource, rollup-ratio or other conditions, given that no filter is applied. We would benefit greatly from knowing what the expected scan times would be for the 1-metric-5mil-rows-no-filter use-case. It would give people more certainty about whether what they see in their clusters is normal or abnormal.