Broker Cache completely ignored if query interval shifts 5 seconds

Hi,

I have a fairly simple Druid cluster setup with a single broker node. Caching on the broker is configured using defaults (i.e. local LRU cache),

I execute a timeseries query and I notice if I resend the query the cache is used as expected. This is verified by looking at the cache metrics.

However, if I send the same query except change the intervals so that the from/to values are now 5 seconds later than the values used the original query, I observe that I get 0 cache hits.

This doesn’t make sense to me. Shouldn’t the broker lookup the cache results by segment and exclude intervals as part of the key?

If I run the 2 queries with “bySegment” I can confirm they hit the same segments.

Oddly enough, when I ran a query and changed only the ‘to’ value of the query to 5 seconds later, the cache is used.

I’m using version 0.11.0 of Druid.

Is this a bug?

Thanks,

Liz

Hi Liz,

The cache is by segment and the time range within that segment is part of the cache key. If you’re using a typical segmentGranularity, like “hour” for example, then changing the from/to timestamp within a particular granularity bucket (hour) will mean Druid won’t use the cache for that bucket. The idea is that if you have hour granularity segments and you do a past-6-hours query, potentially changing the “from” and “to” each time you query, then at least the middle 4 segments will always be pulled from the cache.

Gian,

Thank you for the answer. That explains it for me.

We are using segmentGranularity=DAY and then the query interval is for 24 hours. For example [2018-02-19T15:00:00/2018-02-20T15:00:00] and then repeat this query for [2018-02-19T15:00:05/2018-02-20T15:00:05] and so on as a 5-second sliding window.

We tried using segmentGranularity=HOUR but found we had OOM issues on the historical node even for small amounts of data and 12G heap.

Do you know if there any way we can achieve what we are doing (24hour sliding window query) and take advantage of caching on the broker?

Thanks,

Liz

Hi Liz,

What kind of OOMs are you getting on historicals with “hour” granularity? In general Druid should be managing memory better than that - I wonder if you are using one of the query types known to cause memory issues (such as “select” with a high threshold – which can be replaced with “scan” if so).

Hi Gian,

We saw cases of spending too much time in GC and also a couple of situations where one of the survivor spaces would just get stuck at 100% and no cleanup occurred.

In terms of query types, we do timeseries with histogram aggregations (approxHistogramFold) and groupBy on a singled dimension with low cardinality.

Thanks,

Liz

Hi Gian,

Just letting you know that I’m going to revert back to segmentGranularity=HOUR. There were other config changes we did so it’s probably worth us giving this a try again - we are using the kafka indexing service for ingestion and I increased the taskDuration and reduced the number of topic partitions so we should be producing fewer shards within each segment.

Thanks for your help thus far.

Liz