Cache Performance

As our cluster grew, we moved the cache from the broker tier to the historical nodes, and saw much better latency and cache utilization. We use local cache exclusively, rather than memcached.

Recently we noticed a pretty serious performance hit once the cache filled up and the eviction rate increased. Even though the cache hit rate was still good, historical load was much higher, and GC times were about 3x normal.

Would we benefit from using the new caffeine cache extension that Gian created a while back? We’re still on 0.9.0.

Thanks for any tips.

Regards,

Max

Hi Max, the benchmarks for the caffeine extension showcase that it is definitely more performant than the local heap and has seen production usage at scale.

https://github.com/druid-io/druid/pull/3028 has more details. I would recommend trying it out.

Thanks, Fangjin

We got the extension installed, and it seems to be working well. We’ll monitor its performance for a few days.

Regards,

Max

Just FYI, be sure to monitor heap metrics during this time. The cache is very efficient but the way the extension uses it can cause extra heap pressure under high eviction rates. On the majority of our nodes it worked out of the box, but some of our very high query load nodes had to have GC tunings.

Just to tie up this thread, the caffeine cache extension turned out to work very well for us, even under high eviction rates.

We saw about 50% less GC time while under our peak load, healthier heap usage, and lower overall query latency across the whole cluster.

Recommended, +1

Cheers,

Max

Thanks for posting back!