The use cases for Druid I’ve seen so far have all been big data scenarios with petabytes of data etc.
However I’m wondering if Druid can also be a good choice and cost effective for small scale deployments. The benefit would be to be able to create a solid architecture once and then casually scale it up as time goes on instead of starting with a RDBMS or similar and then having to rearchitecture the whole thing when that becomes difficult to scale.
As a starting point we can imagine a web startup needing to track only about 5000 events per day. They’re currently pushing events to Kafka so we can assume that will be the integration point for Druid ingestion.
What would be some minimal EC2 setup needed to support such a scenario?
Given that the druid quickstart tutorial recommends a minimum of 2vCPUs and 8GB of ram could one get away with running the whole Druid ecosystem on a single m4.large node for example? Or better to cluster together some even smaller instances?
A related question when aiming to scale as small as possible, how is Druids durability without redundancy? Will Kafka + single Druid node + S3 deep storage be durable and catch up on data not yet persisted to S3 in case of a crash (sorry if I’m missing something obvious here, I have yet to did deep into the Druid documentaiton)?