I work for a Bank and my current project uses OLAP cube for Some PnL analysis. We do have more around 20 fact tables and 30 dimensions tables. We also have clients tools where user could drill , pivot and filter data. Please let me know the feasibility and challenges that we will face if we replace the Cube with Druid ?
Without knowing anything about your actual use case, I’m going to take a stab at what I think might be some major points you’ll want to be aware of up front:
- Druid does not do big joins. It does support small joins (like postal code -> state or other very simple star-schema stuff) called lookups. If you cannot express your data in denormalized form then Druid will probably not work as expected.
- Batch ingestion is great. Realtime ingestion is best effort (for now). Batch indexing is as reliable as hadoop. The realtime streaming indexing (an optional component) is still undergoing development to make its data delivery guarantees more strict.
- Data is append only or replace. There is no “update” command in Druid. To update prior data one must replace the old data with a newly indexed version. The exception to this is if you use lookups, then the lookups resultant values can be modified ad-hoc while the data in the data source (the lookup keys) remains the same.
The above constraints make druid very very fast, reliable, and cost-effective for its intended use cases around append-only data streams, but also limit its flexibility for other use cases.