Is Druid able to handle join and subquery? How is the performance?

Hi,

I am new to Druid.
Many Business Intelligence queries involves joins of multi-tables, and subquery for intermediate results. Does Druid support such queries?

Thanks,
Ying

Hi Ying,

Druid does not support joins natively as we recommend doing joins at ETL time. The white paper (http://static.druid.io/docs/druid.pdf) goes into some detail about this, but the ways to implement joins that we are aware of are very resource intensive and are not very suited to a multi-tenant environment. I’m also not aware of any system that can do joins on large data sets with a low enough latency to power an interactive application. Druid does support subqueries for groupBys.

I would be interested in understanding your use cases for requiring joins at query time. There are potentially other ways to arrive at the same answer with still keeping query latencies very fast.

Ying,

I think there are a couple of different use cases for joins in BI. There’s dimension-to-fact-table style joins, where you want to store, say, product IDs in your fact table and product names and other details in a dimension table. Then there’s joins of different kinds of facts, where you you want to do something like store ad impressions and ad clicks separately and join them to determine e.g. the clickthrough rate.

Currently we recommend doing both kinds of joins at data integration / ETL time. In the future, the first kind of join should be doable with good performance at Druid query time (see this proposal: https://groups.google.com/forum/#!topic/druid-development/udiivvgoMO8). Even when that feature is available, the second kind of join should still be done pre-Druid.

Does that answer your question?