Need advice in going forward with Druid Architecture Design


We have data coming in through a stream (Kinesis), and we have 150 Million events per day flowing through stream today and would go to 300 Million per day in a year. We generate events for multiple activities that happen on site. Our events could be a like on an item, a comment by a user, a page view or even a click.

Since each of the event types has a different intention, we cannot give same dimension spec for the datasource. I am not sure of how to take this forward.

Yes, we can build a layer on top of tranquility that would pick event types and insert it as different datasources, but we would then have 100-150 data sources. Which sounds wrong since we really had only 1 datasource. Does it have any down sides of having 150 data sources? From what I understand we need to have enough JVMs to support it.

If we go down one datasource, we need to have a single DimensionSpec & MetricSpec from my datasource which would not work, since I am interested in certain dimensions a,b,c for like event and interested in dimension d,e,f for comment.

What route should we take for our requirements and how do we take this forward?

I would just list up all the dimensions a - f, and make sure a b and c are not null in the query for likes, and same goes for d e f for the comments query