Cache on historical nodes

Hi guys,

For enabling caching on historical nodes is it enough to set `

druid.segmentCache.locations parameter

or should i set

druid.historical.cache.useCache true
druid.historical.cache.populateCache true

also?

Thanks Vadim.
`

The segmentCache is not a query cache. It is where segments will be stored.

You’ll need to set your cache configs in the ocmmon configuration and turn on

Hi guys,

For enabling caching on historical nodes is it enough to set `

druid.segmentCache.locations parameter

or should i set

druid.historical.cache.useCache true
druid.historical.cache.populateCache true

also?

Thanks Vadim.
`

druid.historical.cache.useCache true
druid.historical.cache.populateCache true

Should i also set
druid.cache.sizeInBytes
Maximum cache size in bytes. Zero disables caching.
0

Should i also set
druid.cache.sizeInBytes
Maximum cache size in bytes. Zero disables caching.
0?

Will it apply to both broker and historical query caches?

So, i have these lines in all nodes’ common config

druid.broker.cache.useCache=true
druid.broker.cache.populateCache=true

druid.historical.cache.useCache=true
druid.historical.cache.populateCache=true

And i could not see any difference for the same query run several times. I run a groupBy on 2 dimensions + 1count on all data shared accross 3 histo nodes.
Shouldn’t i get the result almost instantly having enabled the query caching?

Thanks, Vadim.

Hey Vadim,

Yes you do need to set druid.cache.sizeInBytes or else you’ll have a cache that has capacity for zero queries so you’ll see no benefit.

Also note that by default the cache will not store queries for groupBy or select type queries. If you want to enable caching for groupBys, you’ll need to set druid.broker.cache.unCacheable= and druid.historical.cache.unCacheable=, see: http://druid.io/docs/latest/configuration/historical.html#caching

So, the historical nodes cache worked very well for groupBy after the druid.historical.cache.unCacheable modification.
However, i’ve done the same for broker and i’m expecting not to hit the historical nodes at all. Nevertheless i can see in the historical nodes’ logs that they process the query (even if it’s coming from cache and is pretty fast). Any idea?

The configuration from common config on each node is:
druid.broker.cache.useCache=true
druid.broker.cache.populateCache=true

druid.historical.cache.useCache=true
druid.historical.cache.populateCache=true

druid.historical.cache.unCacheable=[“select”]
druid.broker.cache.unCacheable=[“select”]

Hmm, you’re correct that the broker node should not be hitting the historical for a cached query. I’m assuming that you’re using a single broker node as opposed to a set of brokers behind a load balancer in which your second query might be hitting a different broker node. The only other thing I can think of is perhaps you haven’t allocated a large enough cache on the broker with druid.cache.sizeInBytes so the broker isn’t able to fit the results in cache. Group by queries can get pretty large.

Yeap, i’m using a single broker and it worked after i’ve set the cache size to 1 GB.

However, it takes a lot of time to return the result from broker’s cache - ~20s. Interesting why does it take so long and why do i need so much cache memory for a query limited to 1000 (some kB of data), even if it’s a groupBy. Doesn’t it cache just key -> value? Is there any way to see the size of data from cache?

Thanks, Vadim Vararu.

Hopefully someone more familiar with the cache code will be able to answer this, 20 seconds feels too long. Perhaps one way you could inspect the cache is set up a local memcached server and point Druid at that and then inspect memcache.

Vadim, share your full configuration. JVM tuning, etc.

Share hardware details, runtime configurations, common configurations, etc etc.

I have a related question, if :

s3cmd du s3://druid/base/segments/

has 1 GigaByte, does it mean that if I set druid.cache.sizeInBytes big enough, there won’t be any reading requests to S3 because it all fits into cache?

How big the cache size should be for 1GB of physical segments?

Does it also mean that if I disable druid.historical.cache.useCache then Broker query would trigger s3 request?