Filtering using dimension subquery on some datasource

Hi,
I’m new to Druid and I understand joins are not supported but is there a way to filter rows by matching a dimension value with a subquery result?

For example, I have historical user events containing different dimensions/metrics and I’d like to filter them according to current user profile values, it’s not possible to do denormalisation at ingestion time as the filter must relate to current users state.

In SQL it would have looked something like this:

SELECT username, SUM(deposit) AS deposits

FROM user_events

WHERE username IN (SELECT username FROM user_profile WHERE frozen = 1)

GROUP BY username

As far as I can see, the best approach is to do two queries. Its quite important to note that users info isn’t the type of info someone should put into Druid, because it’s not tied to time the same way events are tied to the time axis. Users info may be in something like an SQL or noSQL-Document-Oriented storage rather than in a time-series database.

This leaves no choice: doing two queries seems the best approach.

I can see why maintaining current user info table doesn’t quite fit the Druid time based approach, Nevertheless, Druid seems very powerful with single table analytics but filtering based on other data could be very valuable as well, this leads me to the next questions:

Assuming first query (on user info) performed on some other db returns high cardinality result set, how can it be used in Druid query and what are the size limitations on query length?

Any ideas on how can a dimension be filtered using an external query which returns a large result set (e.g. users that exists in an external query result set)?

Try https://github.com/implydata/plyql for SQL queries. It acts as an advanced query planner for Druid as well.