What-if analysis with druid

I am trying to evaluate druid for building an application that supports what-if analysis support to users. This co-relates closed with excel kind of functionality where changing a value in one cell triggers changes in other cell values. In our case we would have multiple rows in various tables getting updated when the user changes a row value. The response needs to be in real-time or near real-time.

Does Druid fit such an use-case? If so, what would be some of initial steps to evaluate druid for this use case?

Thanks!

Druid’s target use case is event stream data, meaning you have a stream of immutable events you want to do slice and dice OLAP against.

You CAN use druid as an immutable data store to draw data from, including various aggregations on the data. But having a what-if interface would be an extra layer on top which would really need a reason to slice and dice the immutable historical data for Druid to make sense.

Ok. I understand that druid doesn’t fit the what-if use case because we have mutable data (data in different tables changes based on user input)

I didn’t follow the event stream data part. Does the data needs to be streaming data (aka continuous data like click streams or logs etc) to be applicable for druid or did I understand it incorrectly? Won’t druid be able to perform OLAP operations over any kind of immutable data?

Druid can do OLAP over any kind of immutable data, it doesn’t have to be streaming. Most people using Druid and finding value from it do have large streaming datasets, although there are definitely folks finding it valuable for non-streaming datasets as well. The really common thing is wanting to do OLAP / slice-n-dice type analyses.

Ok. Followed it now.

Pardon me if this question doesn’t fit right on this forum but since there are big data experts out here, what technology should I explore out for the use case I describe in my original post - i.e. what-if kind of analysis

“This co-relates closed with excel kind of functionality where changing a value in one cell triggers changes in other cell values. In our case we would have multiple rows in various tables getting updated when the user changes a row value. The response needs to be in real-time or near real-time.”

A few pointers to get started would help. Thanks!

Depends on the scale you’re talking about.

If I had to define a sequence of pretty well defined procedures for which I constantly changed the original input values or other parameters at various stages, but not in a programatic way, and which could use a good size dataset (TB or larger), I would probably use Spark if I could, or Hadoop if I had to.

If you want to have some sort of automated optimization, then you’re in a whole other can of worms (which Spark can still handle in some areas).