Hi, I’d like to try out druid for our company.
As it’s very time-consuming to set everything up and pull data in, it would be helpful to understand the resources that I would need.
We store mobile events to be used for analytics and I’d like to have 1 month of data in the cluster ready to be queried, that’s about 15B records.
device_id - high cardinality, should tens of millions distinct values
platform - 4 distinct values
version - 100 distinct values
tags - Array of strings, 5-6 elements, max, 50 distinct values
options - Array of strings 3-4 elements, max 30 distinct value
count - all records will be shipped with value of 1.
I’m planning to execute TopN, GroupBy sort of queries using distinct (for device_id) and filters on dimensions with no more than 10 seconds query time.
Currently, we have a 32 machine Hadoop cluster in production with 24 cores and 64 Gb RAM on each node.
Please provide some insights.