Low latency topN queries and filter on large sets of dynamic data

I have two related questions.
The first is a quick question on the wording on lookup sizes in the Druid documentation. It is about the following paragraph on http://druid.io/docs/latest/development/extensions-core/lookups-cached-global.html:
*Globally cached lookups are appropriate for lookups which are not possible to pass at query time due to their size,
or are not desired to be passed at query time because the data is to reside in and be handled by the Druid servers,
and are small enough to reasonably populate on a node. This usually means tens to tens of thousands of entries per lookup.*Does this mean that there is no way to do ‘joins’ with large tables of dynamically changing data at query time with Druid?

The second question is more general, and I need to explain a bit before I can pose it.

My use case is that I want to do a low-latency topN query, where the items need to be filtered by availability (among others) at query time. The availability of an item is a piece of dynamically changing data that resides somewhere else (there is some freedom in choosing ‘somewhere else’, be it an RDBMS or a key value store). The total number of items is around 2 million, while the number of events is in the order of 20 million per day. Is there some way to accomplish this efficiently using Druid?

Kind regards, and thanks in advance (for reading this),

Bart Frenk

The lookups extension you are referring to is not intended as a proxy for joins (though that is a common hack with mixed results).

The lookup framework in general, could potentially be used for large joins, but no design work has been done as to what such an impl would look like.

For the “availability” query, how transactional is your query supposed to be? What happens if the transactional results change while the query is being calculated? what about after calculation is done but results are being returned to whatever is asking the query?

The current lookup stuff that I know of is all eventually consistent. Anything other than eventually consistent is going to be a challenge to do solely in druid… or to do fast at petabyte scale in any system.

If you want to squeak the most performance out of druid, doing a custom Lookup extension might be the way to go, that would give you the most control over exactly what kind of behavior you are looking for, but I don’t know of any current efforts to hook transactional data into druid.

Does that answer your question?