Need help in Druid Setup


Hope you are doing well. I am very new to data warehouse and analytics so need your help in understanding.

Please find the attached sheet for sample data events.

We receive millions of different types of events per day like Page_View, Article_Show, Install, App_Launch, Basic_Update, Subscription_Purchase etc. from different platforms like WEB, IOS, ANDROID, WAP, PWA etc.

I have to set up a data warehouse in my company. Following are the uses cases -

  1. We serve lots of articles to our users to read/comment/share etc.
  2. Users can create/update their profile data.
  3. Users can subscribe/unsubscribe to one/multiple newsletters or products.

My question is how to keep these different data sets (events and profile) in Druid? Will it be a single datasource or multiple datasources, so that we can easily analyse the data and execute queries as per Product needs.

Based on the sample data, how can we get results of the following queries -

  1. fetch all the articles with id ‘A1’ and author ‘AA’ ready by user with gender ‘Male’ and city ‘Delhi’ in given time period with latest user profile data etc. Since Article_show events only have user_id and user profile information is received in other events so how can be combined them in one row to get this results based on the timestamp for which it is executed.
  2. find all user details who have an active subscription on a given date in the given city.
  3. or any other type of queries with permutation and combination

I really need your help and am looking forward to getting a positive response in setting this up and making it queryable.

Thanks & Regards
Amit Srivastava

Data Format.xlsx (12.8 KB)