In our legacy analytic platform ,we have a metric group consisting of several metrics . All metrics within a group are applicable on certain dimension combinations. We plan to migrate this application and are evaluating druid as the aggregation engine.
At first glance , it seems to readily map to a datasource in druid. But my questions are below:
1.Metrics within a metric group have a large variance in cardinality.
Eg metric group is website stats for a dimension combination of age group of visitor , geo , etc (upto 7-8 dimensions)
one of metrics could be page hits ( roughly in billions/perday)
another metric could be number of purchases made on site(~10k per day)
2.Individual metrics within same group could be loaded from upstream by different processes /threads in any order.
We could either model all logical metrics under one dataSource (Query layer would be easy simple query)
Or we could use multiple dataSource per metric and combine all queries at query layer . Segment sizes should be pretty small ( < 100-150MB) as per my initial estimates
Can you elaborate on pros and cons of each approach in terms of storage and performance?
(We would like to store minutely aggregated data for a ttl of 3 days for realtime querying/slicing dicing
and precomputed hourly aggregates beyond the 3 days limit
Thanks and Regards