Historical vs Broker caching

After running some load testing I’ve found caching with memcached on broker nodes to be more performant than memcached on historicals, at least in the query load per second that it can handle. I’m curious why the production configs suggest caching on the historicals instead, and whether there is something that I overlooked that would make me want to switch back. Any insight on the matter would be awesome.

Thanks!

Michael

have the same confusion, is there any guide? thx.

在 2015年9月18日星期五 UTC+8下午12:58:21,mcap…@kochava.com写道:

Thank Michael, do you have some numbers to share, it would be nice to see where the difference comes from and whether it’s an artifact of your benchmark setup / config or whether it’s something else that we can improve.

Typically the answer to your question is “it depends”. It depends a lot on the types of queries you run, and it depends on your data. Sometimes the broker can become a bottleneck if you have lots of segment results to merge, in which case it can be beneficial to off-load to historical nodes in order to distribute the merging load.

What would you recommend for a small cluster :

master m4.large

middleManager m4.large

broker-1 m4.large

broker-2 m4.large

historical-1 m4.large

historical-2 m4.large

If the only production queries are “select … where … groupBy …” + only administrators use Pivot …

Since Broker cache doesn’t cache “select” and “groupBy” queries by default, it should be probably using only historical cache, right ?

Or do you thing that it is a good idea to use only Broker cache and enable “select” and “groupBy” queries in druid.broker.cache.unCacheable ?

Hi Jakub, given the small size of your cluster, caching on the broker will typically perform better.
However, I still encourage you to benchmark different configurations and draw your own conclusions

hi Xavier,but we issue only queries that are not chached by default on broker node,do you think it is wise to enable caching of select and groubBy qeuries?

Sure, I would give it a try, assuming that the query results are not too large I don’t see any reason not to.

Perfect, I’ll give it a shot and write a query benchmark to see how it works out.

Btw this doc http://druid.io/docs/latest/querying/caching.html says that :

it is not recommended to enable caching on both Broker and Historical nodes

But it is not clear why and what problems it might cause. I’ll blindly follow this recommendation and disable caching on Historical nodes then :slight_smile:

There’s no need to enable caching on both. If you turned on caching on the broker for example, the historical cache would never get used.

Actually, you can enable caching on both, and also set cacheBulkMergeLimit, which will limit the number of cache fetches the broker will try to do before falling back to querying the historical nodes.

After 3 months in production I had to disable caching on the historical nodes, leaving only broker caching enabled. Because we’ve noticed that sometimes queries return empty results. Even queries that span 20+ segments. I hope it helps.

That really shouldn’t happen related to caching. Also, what kind of cache are you using?

These used be my docker-compose settings, otherwise I stick to defaults :

  • HISTORICAL_DRUID_HISTORICAL_CACHE_USECACHE=true

  • HISTORICAL_DRUID_HISTORICAL_CACHE_POPULATECACHE=true

  • HISTORICAL_DRUID_CACHE_SIZEINBYTES=1000000000

  • HISTORICAL_DRUID_CACHE_TYPE=local

  • BROKER_DRUID_BROKER_CACHE_USECACHE=true

  • BROKER_DRUID_BROKER_CACHE_POPULATECACHE=true

  • BROKER_DRUID_BROKER_CACHE_UNCACHEABLE=

And with these settings, all of a sudden some :

SELECT COUNT(DISTINCT foo) WHERE blabla GROUP BY bar

that spanned even 20 segments started returning empty results …

BUT !!! Now I noticed that it started returning correct results after plyql server restarted :

plyql -c 2 -h broker:8082 -i P2Y --json-server 8099

I submitted some illegal queries into it which crashed it :

/usr/local/lib/node_modules/plyql/node_modules/q/q.js:155

throw e;

^

Error: can not serialize an approximate unique value

at UniqueAttributeInfo.serialize (/usr/local/lib/node_modules/plyql/node_modules/plywood/build/plywood.js:715:19)

at DruidExternal.makeSelectorFilter (/usr/local/lib/node_modules/plyql/node_modules/plywood/build/plywood.js:3779:38)

at DruidExternal.timelessFilterToDruid (/usr/local/lib/node_modules/plyql/node_modules/plywood/build/plywood.js:3833:37)

at DruidExternal.timelessFilterToDruid (/usr/local/lib/node_modules/plyql/node_modules/plywood/build/plywood.js:3823:37)

at DruidExternal.makeNativeAggregateFilter (/usr/local/lib/node_modules/plyql/node_modules/plywood/build/plywood.js:4534:30)

at DruidExternal.applyToAggregation (/usr/local/lib/node_modules/plyql/node_modules/plywood/build/plywood.js:4686:36)

at DruidExternal.getAggregationsAndPostAggregations (/usr/local/lib/node_modules/plyql/node_modules/plywood/build/plywood.js:4701:22)

at DruidExternal.getQueryAndPostProcess (/usr/local/lib/node_modules/plyql/node_modules/plywood/build/plywood.js:4978:64)

at DruidExternal.External.queryValue (/usr/local/lib/node_modules/plyql/node_modules/plywood/build/plywood.js:3170:48)

at ExternalExpression._computeResolved (/usr/local/lib/node_modules/plyql/node_modules/plywood/build/plywood.js:6788:29)

PlyQL server listening on port: 8099

The plyql docker container restarted and after that it started returning correct results, so I’m being suspicious that it could be caused be the plyql server. I’m using the release from 1.2.1 imply data distribution …

Hey Jakub,

Like Charles said, it would be alarming if the Druid cache caused incorrect query results. We’d definitely treat that as a serious bug to fix asap. There aren’t any correctness issues I’m currently aware of.

If your issue does look more like a plyql problem, would you mind reporting that through one of the imply channels? e.g. https://github.com/implydata/plyql/issues or https://groups.google.com/forum/#!forum/imply-user-group.

Done : https://github.com/implydata/plyql/issues/51

Thank you