Hi Ranjit, see inline.
I am trying to design a system that has the following requirements
- 100 Concurrent queries (mostly Historic)
- Ingest and Store ~1 million events per day
- Query responses must be under a second and queries are mostly aggregate / filter queries
If it helps your decision making, we are doing about 100 concurrent per second in production right now and ingesting about 1 million events (usually <100 dimension, < 50 metrics) every 2 seconds, so Druid should be able to scale to your needs
- What’s the recommended deep storage system for this system? (I’d prefer S3 to avoid management overheads)
Deep storage is just a permanent backup of data and is not involved in queries. If you have HDFS available, that is a popular option for deep storage.
- How many Druid cluster nodes should I have in the cluster?
It depends on the type of hardware that you have. Given your relatively low volume of data, you’ll likely generate 1 segment per day. You can probably get away with a very minimal setup and combining services on the same node. What type of hardware do you have access to?