Tips on how to segment the data

Currently we are using two data sources, one related to events where the user interacts with us as an entity, logins, signups, etc, and the second is what the user is doing more specifically on our platform.

user events = ~10 dimensions, 4 metrics

platform events = ~10 dimensions, 7 metrics

The common dimension between them is username and sometimes we want to cross reference. Where specific user events will be used in the platform events.

Is it a good idea to split them like these or would it be easier to simply put them all in the same data source?

regards,

Robin

Hey Robin,

Both approaches will work, but usually you can prefer one or the other based on which one makes ingestion and data management easier. Sharing a datasource can be more efficient in terms of ingestion resources, but splitting them can make ingestion and data management easier. Check out http://druid.io/docs/latest/querying/multitenancy.html if you haven’t yet; even though it’s about multitenancy, a lot of the advice there applies to your case too (different event types rather than different tenants).