I have scoured the web for examples of Druid setups like this but have not had any luck. Wanted to make a post before passing off Druid as a possible solution.
I have a large amount of records in this format:
Timestamp, UserId, Field A
I want to be able to know different percentiles and averages of the number of records per user. Would also like to be able to know how many users have greater than X records in a day. I assume with 10-50 million users in a given day, there are too many to store UserId as a dimension. I’m also interested in knowing the median or average number of unique values of Field A per user. Again, there are too many unique values of Field A too be stored as a dimension. Does Druid support this group by before aggregation or would I have to provide Druid data that is already aggregated by user id? Looking for day granularity.
Thanks for the help.