[druid-user] load patterns suitable for simple clustered deployment


I am examining Druid and need to get a sense of the cost involved in running a cluster on AWS.

I was looking at the deployment specification suggested in Druid’s documentation:

Could any one give some general examples of load-patterns (in terms of ingestion rate & querying size rate) that this ‘simple cluster’ will be comfortable with?
I understand that there are many factors that come into play here, but I really need just a very general idea.


Hi Baral!

You are absolutely right about the number of factors!

For example, with a row-throughput of about 10,000 rows per second per core, you would need 10 cores dedicated to ingestion in the MiddleManagers to handle 100,000 rows per second. MMs also handle real-time queries, so extra cores are needed for that, too.

Sizing the historical servers is a matter of how much you want to store (disk space), the replication factor (usually 2), and then the processing power comes down to how many queries you intend to have running in parallel, the number of segments that each query covers, and the length of time that each segment takes to be processed.

You usually have three master nodes for redundancy in a production environment, and then one or more brokers and one or more routers. They’re not very big machines usually so like m series until you get a lot of segments when you’ll need more memory.

It is therefore often better to work backwards from data size and query performance requirements in order to size a cluster, and companies like Imply will help you go through a more formal exercise to help you work it out.