As I’m having a hard time finding reading material about Druid for lighter use cases with only one physical server, I’d be happy if you guys could give me your opinions on whether going with Druid might be worthwhile for my use case.
Currently we are using a single node of Elasticsearch for ingesting and querying our data. The data volume is about 20 million rows per day, with 22 dimension columns and 2 metric columns. The data is delivered via CSV and JSON files only which are currently transformed into a unified format and then ingest it using parallel HTTP post requests to Elasticsearch. We chose Elasticsearch a couple of years ago only for its aggregation perfomance, and have no need for full text search.
While the daily data volume isn’t really big, but the use case for us is to keep the data around for at least 5 years and query it interactively. For long time ranges over a year, Elasticsearch is simply not fast enough to be called “interactive”.
Our hardware budget is very limited, we are running the Elasticsearch cluster on a 12-core Intel Xeon E5-1650 email@example.comGHz machine with 128 GB RAM and hard drive storage (no SSD).
My question is: Would it be feasible and a clear improvement to swap out Elasticsearch for Druid on this hardware? For queries over a long time range (1-5 years) which calculate a sum one a metric column for one to several dimensions, is it likely we could be keeping response times inside a couple of seconds even on a single machine setup?
I am sure such a setup would benefit a lot from using SSD storage, but to justify the cost it would be good to know what measure approximately.
Thank you for any input on this matter.