Questions about query caching

We are currently looking into the caching functionality of Druid and bumped into a couple of issues and open questions.
To explain the context of my questions fully, I feel that I need to describe our setup and our intentions. Feel free to skip ahead to the questions and to only consult the next “chapter” of my long post if needed.
our setup:

  • our production cluster has caching enabled on the historicals and uses local caches (MapCache).
  • on our stage cluster we are testing a hybrid cache setup with caffeine as L1 and memcached as L2 cache
  • we use the lookups-cached-global extension to define a couple of lookups and have some that are based on pulling mapping files from S3 and others that are map-based.
  • initially we had a poll period of 5 minutes configured for the S3 based lookups
    We noticed that query caching didn’t work as we were expecting it and analyzing the issues was quite difficult as several issues overlapped:
  • in-tier replication makes local caching less effective because one and the same query needs to be submitted several times before all local caches contain the resultset.
  • negative query priorities aren’t lower than positive ones
  • queries that could be fully served from cache get in line with all other queries waiting to be executed, so just because a query is fully cached doesn’t mean that Druid will serve it up quickly if there are other queries hitting the cluster
    After we had disentangled these issues, we still observed that caching didn’t work as expected.
    We were sending the same query over and over again and sometimes Druid would consult the cache and sometimes Druid would recompute the query.
    After a while we found out that it was related to the poll period we had configured. With a poll period of five minutes, the cache keys for queries based on such lookups would only be “found” in the cache for the next five minutes and would then be recomputed.
    So we increased the period to 3 hours and rewrote low-cardinality lookups as map-based lookups which seem to be cached longer.
    After that we tried to look into the source code to see what’s happening, but it is difficult to get the big picture as many classes seem to be involved and it is not even possible to determine which classes come into play and which don’t.
    our goal
  • is for cached query results to survive for a long time (it should survive a cluster restart, updated to lookup definitions as long as they don’t change content-wise)
  • our vision is to operate a distributed cache that turns into an OLAP cube over time
    our issues
    Looking into the code, this doesn’t seem to be happening:
  • the NamespaceLookupExtractorFactory class puts a random ID into every cache-key, so that cache entries don’t seem to survive node restarts
  • the MapLookupExtractor class seems to form a cache key that consists of every key-value pair defined
  • the LookupExtractionFn encodes injectivity, optimize and other flags in its cache key but no code seems to call the isInjective() propery of any of the classes I looked at.
  • the RegisteredExcractionFn meanwhie does not encode injectivity nor does it contain any random Ids in the cache key it generates
    our questions
  • is possible now or in the near future to have robust cache keys that only if expire if they absolutely have to. We want memcached or some other remote cache to grow into an OLAP cube over time.
  • does the new lookups-cached-single extension improve the lifetime of cache-keys?
  • any way we can evade the current invalidation of cache keys on every poll period?
  • there doesn’t seem to be any mechanism that cleans the outdated cache keys, or is there?
  • we see a very low hit rate of memcached (2%). where does memcached come in anyway if cache entries based on lookups grow stale and cache keys change on every node restart? Is it mostly helping out in situations when there is in-tier replication configured? Am I missing some other aspect?
  • if a lookup is injective, I imagine that it is possible to cache resultsets based on the original dimension values and translate them via lookups at query time, but I cannot find any code which would implement this behaviour. Is such an optimization present for injective lookups? If not, can it be had or wouldn’t it work?

sorry for the long text.
thanks
Sascha

I don’t have much experience with the namespace-based lookups, so I cannot answer those questions, but let me respond to the things that I can:

  1. negative priorities not being less than positive sounds like a bug, please file an issue if you haven’t already

  2. the broker can rewrite lookups out of queries if the lookup is “one-to-one”, which I believe is the definition of “injective”, though I’ll admit that word is not normally in my vocabulary so I could be wrong. The logic for the rewrite is in the queries themselves, largely in the “preMerge” section of the QueryToolChest. Any query that does lookups from that should have long-lived, per-segment cache results (the historicals won’t even see the lookups anymore).

One other option you can explore is writing your own Lookup implementation. This would be especially useful if there is something you know about your data that you want to take advantage of which is not easily generified to everyone.

–Eric

Awesome! Thanks a lot Eric.
What you wrote about the broker rewriting one-to-one lookups is exactly what I needed to hear. It one-to-many lookups cannot be cached as well, we’ll restrict ourselved to use one-to-one lookups only. So this would work fine for us.

One question about the one-to-one lookups, which corresponds to the “injective” flag with which one signals to Druid that it can safely assume a one-to-one relationship: I’m currently seeing these injective flags only in the dimension extraction functions. Logically, I’d say that the injectivity should be defined in the mapping-definitions already. A set of Ids that are mapped to a set of values either have a one-to-one relationship or they don’t. It’s not query specific whether this relationship holds or not, at least thats my current understanding.
So it would be quite desirable if we didn’t have to leave it up to clients whether this flag gets set but could communicate injectivity to Druid on the server-side, enforcing that queries for one-to-one lookups are always long-term-cacheable.

Is is possible to tell Druid that a lookup is injective within the lookup configurations rather than the client queries?
thanks
Sascha

You'll notice on the LookupExtractor interface there is a method
"isOneToOne()" which is defined exactly for that purpose.
Technically, code can be written such that the property is provided by
the lookup itself or by the query (that's up to the lookup
implementation), but that's the purpose of that method.

--Eric

I will try to add to what Eric said with respect to the Lookup stuff.

We are currently looking into the caching functionality of Druid and bumped into a couple of issues and open questions.
To explain the context of my questions fully, I feel that I need to describe our setup and our intentions. Feel free to skip ahead to the questions and to only consult the next “chapter” of my long post if needed.
our setup:

  • our production cluster has caching enabled on the historicals and uses local caches (MapCache).
  • on our stage cluster we are testing a hybrid cache setup with caffeine as L1 and memcached as L2 cache
  • we use the lookups-cached-global extension to define a couple of lookups and have some that are based on pulling mapping files from S3 and others that are map-based.
  • initially we had a poll period of 5 minutes configured for the S3 based lookups
    We noticed that query caching didn’t work as we were expecting it and analyzing the issues was quite difficult as several issues overlapped:
  • in-tier replication makes local caching less effective because one and the same query needs to be submitted several times before all local caches contain the resultset.
  • negative query priorities aren’t lower than positive ones
  • queries that could be fully served from cache get in line with all other queries waiting to be executed, so just because a query is fully cached doesn’t mean that Druid will serve it up quickly if there are other queries hitting the cluster
    After we had disentangled these issues, we still observed that caching didn’t work as expected.
    We were sending the same query over and over again and sometimes Druid would consult the cache and sometimes Druid would recompute the query.
    After a while we found out that it was related to the poll period we had configured. With a poll period of five minutes, the cache keys for queries based on such lookups would only be “found” in the cache for the next five minutes and would then be recomputed.
    So we increased the period to 3 hours and rewrote low-cardinality lookups as map-based lookups which seem to be cached longer.
    After that we tried to look into the source code to see what’s happening, but it is difficult to get the big picture as many classes seem to be involved and it is not even possible to determine which classes come into play and which don’t.
    our goal
  • is for cached query results to survive for a long time (it should survive a cluster restart, updated to lookup definitions as long as they don’t change content-wise
  • our vision is to operate a distributed cache that turns into an OLAP cube over time
    our issues
    Looking into the code, this doesn’t seem to be happening:
  • the NamespaceLookupExtractorFactory class puts a random ID into every cache-key, so that cache entries don’t seem to survive node restarts

Not sure how the random ID is the issue here. In fact the process restart will clean the lookup cache anyway so i don’t see how you can serve out of it ?

  • the MapLookupExtractor class seems to form a cache key that consists of every key-value pair defined

which IMO make sense right ? the content of the map defines the signature.

  • the LookupExtractionFn encodes injectivity, optimize and other flags in its cache key but no code seems to call the isInjective() propery of any of the classes I looked at.

sorry for the confusion but as Eric said it does optimize the One to One case.

If you want to look how it works look for the usage of this method https://github.com/b-slim/druid/blob/12be1c0a4bbbb3f955e57a8dc18c4279fd8f29e2/processing/src/main/java/io/druid/query/extraction/ExtractionFn.java#L120-L120

both Toolchest of GoupBy and TopN optimize the one to one Extraction function.

Also there is another optimization we have in place for Filters. If the broker can unapply the mapping, the query will be rewritten with a selector filter instead of Extraction_based filter.

  • the RegisteredExcractionFn meanwhie does not encode injectivity nor does it contain any random Ids in the cache key it generates

this class does register the object as is. If you pass in a one to one lookup it will treat it as one to one.

our questions

  • is possible now or in the near future to have robust cache keys that only if expire if they absolutely have to. We want memcached or some other remote cache to grow into an OLAP cube over time.

can you define have to ? not sure what kind of life cycle the cache or the key-value pair will have ?

  • does the new lookups-cached-single extension improve the lifetime of cache-keys?

the new module does introduce a different caching concept but still the life cycle is related to the life cycle of the druid process.

  • any way we can evade the current invalidation of cache keys on every poll period?

not sure if i am getting this right. The question how you make sure that the current poll did not add/remove some entries ?

I guess the cache key of the lookup need to change every poll since you might have added/removed entries.

  • there doesn’t seem to be any mechanism that cleans the outdated cache keys, or is there?
  • we see a very low hit rate of memcached (2%). where does memcached come in anyway if cache entries based on lookups grow stale and cache keys change on every node restart? Is it mostly helping out in situations when there is in-tier replication configured? Am I missing some other aspect?
  • if a lookup is injective, I imagine that it is possible to cache resultsets based on the original dimension values and translate them via lookups at query time, but I cannot find any code which would implement this behaviour. Is such an optimization present for injective lookups? If not, can it be had or wouldn’t it work?

not sure i am getting this are we talking about the query cache or the lookup cache ? they are totally independent.

Please let me know if this make sense and would love to hear more about what kind of lookup cache your use case might need

thanks a ton to both of your for giving this invaluable feedback.

After I learned from Eric about the optimizations Druid tries to do, I studied the respective code and found that this SHOULD do what I need, namely to keep the query cache valid across poll periods for queries that have numerical sorting and have either no lookups or limit themselves to one-to-one lookups.

So, by the way, I was mostly referring to the query caching in my post, not the lookup caches.

I tried to verify whether the optimization in the QueryToolChest for injective lookups works, but I do not observe it so far. I attached a query that I used for testing. I submit it and it has a dimension split on an injective lookup. I declared it to be injective but see that after each poll period, Druid recomputes the query.

As for map-based lookups, I find it wasteful that the whole key-value map is encoded into the cache key.
I think this is an important information to communicate to people so that they don’t use too large map definitions.
I’d like it better if instead a strong consistent hash value would be computed and used in the cache key.
Even better would be if one could manually declare a version for each lookup and this version alone would make it into the cache key.
Sure, if one forgets to update that version one would get corrupt results back, but it would give users more control over caching.
If I know that the contents of a lookup don’t change, I don’t increase the version and vice versa.
With regards to the lookup cache, perhaps one could have an additional check during each poll for whether the content really changed.

druid-injective-lookup-caching-test.txt (2.23 KB)

thanks a ton to both of your for giving this invaluable feedback.

After I learned from Eric about the optimizations Druid tries to do, I studied the respective code and found that this SHOULD do what I need, namely to keep the query cache valid across poll periods for queries that have numerical sorting and have either no lookups or limit themselves to one-to-one lookups.

great !

So, by the way, I was mostly referring to the query caching in my post, not the lookup caches.

I tried to verify whether the optimization in the QueryToolChest for injective lookups works, but I do not observe it so far. I attached a query that I used for testing. I submit it and it has a dimension split on an injective lookup. I declared it to be injective but see that after each poll period, Druid recomputes the query.

First what version of druid are we talking about ?

Second What kind of lookup are you using ? can you share the spec of the lookup ?

FYI having one to one mapping doesn’t imply that after every lookup poll the cache key is the same.

If you are sure that the lookup doesn’t change over the life cycle of the druid process then set the poll period to ZERO that will ensure that you will have the same version always.

As for map-based lookups, I find it wasteful that the whole key-value map is encoded into the cache key.

maybe but we have to start somewhere :smiley:

I think this is an important information to communicate to people so that they don’t use too large map definitions.

You are right and thought the doc mention that will add more emphases

I’d like it better if instead a strong consistent hash value would be computed and used in the cache key.

not sure i am getting this ? which cache key ? as part of the API every lookup need to return a cache key, i am i missing something ?

Even better would be if one could manually declare a version for each lookup and this version alone would make it into the cache key.

how is that different from hashing the entries ?

Sure, if one forgets to update that version one would get corrupt results back, but it would give users more control over caching.
If I know that the contents of a lookup don’t change, I don’t increase the version and vice versa.

if the lookup doesn’t change set the poll to ZERO.

With regards to the lookup cache, perhaps one could have an additional check during each poll for whether the content really changed.

That will be very expensive in some cases. the idea behind the poll is to be agnostic to what was there in the cache before.

BTW for JDBC cached global you can define a ‘tsColumn’ which is the column in table which contains when the key was updated, BUT there is a nasty bug that need to be fixed before you can move to JDBC.

thanks for your reply Eric. Sorry for the late response.

To make it easier to read my replies without too much confusion, I have tried to state more explicitly whether I’m talking about the query cache or the lookup cache and I furthermore capitalize query/lookup so that its easier to follow along.

So, by the way, I was mostly referring to the query caching in my post, not the lookup caches.

I tried to verify whether the optimization in the QueryToolChest for injective lookups works, but I do not observe it so far. I attached a query that I used for testing. I submit it and it has a dimension split on an injective lookup. I declared it to be injective but see that after each poll period, Druid recomputes the query.

First what version of druid are we talking about ?

We are running Druid 0.9.1.1 with a patch that uses guice 4.1 because the older guice version would mask stack traces
Currently, both the legacy lookup extension and the lookups-cached-global extensions are loaded and have lookups defined as a leftover from the transition phase but only the lookups-cached-global lookups are in use.

Second What kind of lookup are you using ? can you share the spec of the lookup ?

We use lookups-cached-global and have two kinds of lookups:

One kind is based on mapping files residing in S3

"adspacewithid_by_adspaceid": {
  "type": "cachedNamespace",
  "extractionNamespace": {
    "type": "uri",
    "uriPrefix": "s3://XXXX/adspace/",
    "fileRegex":".*V2.*",
    "namespaceParseSpec":{
      "format":"tsv",
      "delimiter":"|",
      "columns":[
        "AdspaceId",
        "AdspaceName",
        "AdspaceNameWithId",
        "AdspaceCategoryId",
        "AdspaceCategoryName",
        "AdspaceCategoryIAB",
        "AdspaceCategoryNameWithIAB",
        "ApplicationId",
        "ApplicationName",
        "ApplicationNameWithId",
        "ApplicationCategoryId",
        "ApplicationCategoryName",
        "ApplicationCategoryIAB",
        "ApplicationCategoryNameWithIAB",
        "ApplicationType",
        "PublisherId",
        "PublisherName",
        "PublisherNameWithId",
        "SalesOffice",
        "MultiplierId",
        "MultiplierName"
      ],
      "keyColumn":"AdspaceId",
      "valueColumn":"AdspaceNameWithId"
    },
    "pollPeriod": "PT3H"
  },
  "firstCacheTimeout": 0
},

All low-cardinality lookups have been migrated to map-based lookups that look like this:

"gender_by_genderid": {
  "type": "map",
  "map": {
    "U": "Unknown",
    "F": "Female",
    "M": "Male"
  }
},

The S3 based lookups have poll periods of 3 hours and we observe that Druid doesn’t find queries that make use of these lookups within the QUERY caches between poll periods.

For the map-based lookups we observe that Druid can make use of the cached entries in the QUERY caches for longer than the poll period (of course). So that’s nice, but I still need to test whether the cache keys stay the same even after cluster restarts.

FYI having one to one mapping doesn’t imply that after every lookup poll the cache key is the same.

If you are sure that the lookup doesn’t change over the life cycle of the druid process then set the poll period to ZERO that will ensure that you will have the same version always.

I’m trying to make Druid use this optimization that you brought to my attention, namely, that the historical nodes don’t even get to know about the lookup because the broker already rewrites the query such that the historicals query and merge based on a dimension and the broker then translates the results via a lookup mapping. Prerequisites for this seem to be a one to one mapping definition and numeric sort order.

I made a query that features both but I still observe that Druid recomputes the query instead of fetching the intermediate results from its QUERY caches after about 3 hours which is what we have set as a poll period. So it seems that the cache keys in the QUERY caches are sill containing lookup-related tokens which differ across poll periods.

Setting the poll period to zero doesn’t unfortunately work for our usecases because our lookup definitions ocasionally change.

The poll period to my understanding tells Druid how often to check for updates to a mapping file. Each time Druid checks, I figure that there might be two outcomes: either the content of the lookup mapping file has changed since the last poll or it hasn’t.
By “robust behaviour” I meant that Druid ideally would only produce different cache keys for the QUERY cache if it detects that the contents of a mapping file actually changed. Polling doesn’t mean that they did, only that Druid should check.

I’d like it better if instead a strong consistent hash value would be computed and used in the cache key.

not sure i am getting this ? which cache key ? as part of the API every lookup need to return a cache key, i am i missing something ?

Sorry. my bad. I’m mostly concerned with whats going on in the QUERY cache. The cache keys that get created for the QUERY cache and that refer to a lookup contain some notion of a “revision”. Now, depending on which KIND of lookup that cache key is based on, it makes sense to use different schemes for what this revision is.

The usecase for file-based lookups (S3) is that they are high-cardinality and might change occasionally. Ideally, Druid would change the revision ID in its QUERY cache keys for that lookup if and only if the contents of the mapping file actually changed.

The more interesting case is map-based lookups, so these inline maps one can define. It would be sad if the QUERY cache keys for those lookups would change every time somebody submits some change to a lookup spec or the cluster restarts because as long as the contents of a mapping doesn’t change, the cache keys should also stay the same. This seems to be already the case as the whole content of the mappin becomes the cache key. An alternative way of giving the operator control would be to let him just specify a revision ID along with the mapping definition. In the case of inline-mappings, the usecase is probably that these lookups only change if a human manually modifies them.
Then he could also bump up the revision id. The downside of this is clear, the upside would be to not have long cache keys anymore.
To get rid of the downside of needing to remember to update the revision id manually, one could instead use a strong hash algorithm, such as a maximum entropy hash computed over the whole key-value content.

Even better would be if one could manually declare a version for each lookup and this version alone would make it into the cache key.

how is that different from hashing the entries ?

Good question. I assumed that some developers would regard the use of a hash as not safe enough in that they might assume that it could happen that the content changes while the hash key stays the same. Alternatively, it might be the case that computing the hash might be considered too expensive. My intuition to both things is otherwise, but for those who have these concerns, it might be nice to offer a revision Id that can be manually controlled by operators. In that case, I would be in control over when cache keys would change and I would change them only when I know that my lookup content has changed. Caching in Druid is highly complex and that brings with it corner cases in which Druid would behave differently from what operators expect. Simple is beautiful. If I could know that Druid would just take my lookup revision as that part of the QUERY cache-key which needs to change with each update to the lookup content, then I can be more confident about whats going on inside Druid while I’m in investigation mode.

Sure, if one forgets to update that version one would get corrupt results back, but it would give users more control over caching.
If I know that the contents of a lookup don’t change, I don’t increase the version and vice versa.

if the lookup doesn’t change set the poll to ZERO.
The lookup period only tells Druid how often to check for updates. So, for a time-range spanning 10 lookups, the mapping data might change only once. So I would want Druid to detect this one update which would not happen with poll ZERO and at the same time I would expect Druid in the ideal case to also detect for the other 8 times that the content did not change and not update its QUERY cache keys.

With regards to the lookup cache, perhaps one could have an additional check during each poll for whether the content really changed.

That will be very expensive in some cases. the idea behind the poll is to be agnostic to what was there in the cache before.

I think I’ve meanwhile seen that the last-modification timestamp is being checked. If so, maybe I could try to optimize things within our application that provisions the files in S3. Computing a strong hash based on the content is not expensive though, right? Druid reads the whole content anyways and puts it either into heap or offheap mem.

BTW for JDBC cached global you can define a ‘tsColumn’ which is the column in table which contains when the key was updated, BUT there is a nasty bug that need to be fixed before you can move to JDBC.

I’m curious to learn about it and about the new cached-single extension, but I don’t understand how individual revisions per lookup key-value mapping fits into the big picture of Druids lookup mechanism.

Our main and actually only objective is that a query that Druid has computed stays in a persistent remote QUERY cache for all times and never becomes stale. So far we observe that lookups interfere too strongly with this vision.
On this note, it is great that Druid can reuse some query cache entries. If I query the last 7 days and then extend the query to 30 days then the second query is different form the first and yet Druid can facilitate the cache entries that the first query produced because the cache entries are per segment.
What’s not so great is that if I next query for a subset of metrics, it would again be possible to serve the query up from QUERY cache but this Druis is not capable of doing.
It would be very desirable to improve this in the future either by caching per metric or by merging cache entries that match in filter and aggregation conditions and only differ on metrics within the cache.
Maybe then, it would also make sense to have an external application that submits occational queries to warm up the cache.

meant to write “thanks for your reply SLIM and Eric”. sorry :wink: