we have been using Druid as our OLAP engine, and right now we are consolidating the technology stack with another internal service that currently uses Kylin for building molap cube and serving query via ANSI SQL.
There are pro/cons with each solution as each currently stands (and fast evolving). Instead of an either/or approach, is anyone aware of scenarios where both solutions can complements each other?
Thanks in advance.
It’s possible to combine them in ways that make sense. You could store a more granular table in Druid, with a lot of dimensions, and use Kylin to create some of the very high level aggregations. This plays to Druid’s strength of handling lots of dimensions without breaking down, and Kylin’s strength of “cheating” on high level aggregations by precomputing them. You don’t want to precompute all aggregations at the lowest granular level – it’s generally too expensive – but it can help with high level ones.
But as you said, both are rapidly evolving. Druid in particular is evolving to support more bits of ANSI SQL (it already supports a subset since Druid 0.10.0). It may also evolve in ways that make it easier to generate multiple rollups and choose the best one to query, Kylin-style.
It may also evolve in ways that make it easier to generate multiple rollups and choose the best one to query, Kylin-style.
I’ll be working on this soon. Would be awesome to share ideas.
I know that primitive realizations are possible by handling the rollups and the delegation to the best datasource on client-side but I would like to see support in Druid itself. I would need Druid to be conscious about which intervals of an aggregated view are up-to-date relative to the master-datasource. The broker would then chunk incoming queries up such that as much of the query-interval as possible can be served from the fastest peer datasource. If the peer is not in-sync with the master-datasource for some intervals, then those are served up by the next-best peer and so on, using the master as a safe fallback. Under the hood, this could be handled by a query runner similar to the union-query runner in a first approach.
Later on, I’d love to see a change in the historicals such that each segment in the segment list of a historical query contains an individual reference to a datasource. At the moment all segment references rely on the unique datasource section which I believe is unnecessary.
With this change, cube queries could be sped up even further. I cannot wait to have this feature in Druid even though it will be hard/heavyweight, especially when thinking about multiple rollups in combination with union queries etc.
Meituan.com (美团点评网, one of the most popular internet service in China market）uses both Apache Kylin and Druid. One for historical OLAP and the other for real-time analytics.
Meituan developed the “Kylin On Druid” solution and has been in production use for a while. In August 2018’s Kylin meetup, Meituan shared their experience of running Kylin together with Druid. Now Meituan has open sourced their implementation, the Kylin community is discussing on that, it is quite impressive.
在 2017年5月16日星期二 UTC+8上午4:58:21，zhw…@gmail.com写道：