Query Caching in Historical and Broker nodes

Is it recommended that we enable the query caching on all the historical nodes and broker nodes ?

http://druid.io/docs/0.9.2/querying/caching.html

This page seems to be suggesting that caching should only be enabled on historical node for large data.

Hi Anuj,
As the page suggests the best caching strategy depends on the size of cluster and the result set to be merged on the brokers.

Fwiw, when you enable caching on broker it fetches results for each individual segment from the historical/cache and merges them.

If your queries span across smaller number of segments, the overhead for merging results for individual segments at the broker is minimal and you should enable caching on the broker. If your queries span across thousands of segments, enabling the caching on historicals is recommended so that merging of bySegment results happens at both the historicals and the brokers and thus gives more parallel processing for the queries.

I would recommend doing some benchmarks with both the strategies on your dataset before choosing one.

Thanks Nishant,

Totally agree with you, document above also explains this.

However will there be any issue in using a combination of both of these strategies. If caching is enabled at both broker and historical nodes.

  1. Broker caching will avoid any trips to historical nodes.

  2. Historical nodes will maintain query cache for their segments and will be responsible for merging segments for a query. This will in reduce network bandwidth utilization and parrallize merge processing.

Is there any check at code level which restricts from using caching at both the nodes?

Hi,
there are checks in code to ensure that cache population is only done at one of broker or historical.

Another common way of using caching is to use distributed memcache cluster and configure brokers to read from cache and historicals to populate the cache during segment scan, this still has benefits of merging bySegment results at both historical as well as broker nodes.

Thanks Nishant for explanation!!!

I gather from the docs that if druid.broker.cache.cacheBulkMergeLimit is used, then some results can get cached in the brokers, and others in the historicals.

However, when I try it, it seems that the historical caches stop getting populated. Is this the intended behavior? (Using Druid 0.9.2.)

Thx.

Hello Nishant,

I am also in the boat who is trying to enable cache at both Historical and Broker nodes. But, we are facing issue with using both Broker and Historical cache at the same time.

We are currently using 40 historical which we have divided them into 3 Tiers: Tier 1, Tier 2 and Tier 3. We are using 3 Brokers, 1 for each Tier. Also, we are using Router to route the request to correct Broker so that request meant for Tier 1 should only use Broker reserved for Tier 1.

Now in order to increase the performance, we have enabled Broker cache in Tier 1. But, after enabling Broker cache (not Memcache), Historical cache in Tier 1 is not populating. This has made the response much slower than without using Broker cache. So, the performance is worsen now. **How can I make both Broker and Historical cache enabled at the same time, without using memcache? **

I understand that this will cause redundancy of data in Broker and Historical Cache but, we are ok with it.

Thanks,

Pankaj