imagine you are ingesting impression log of multiple clients, the Kafka events of all clients share :
possibly a few common dimensions with corresponding metrics
But each client can have it’s own dimension set, because their businesses differ.
For simplicity, let’s imagine we would never have to scale horizontally beyond a single node, we are just interested in real-time interactivity.
For instance this says that you can have some dimensions missing in some events, but what about this ^ schemaless case ?
Should each client have it’s own Table DataSource? So that Storm/spark-streaming that is feeding real-time node “categorizes” events by client
and stores them into different Druid DataSources?
Can this run on a simple Druid setup with a single real-time service, like the one ine Druid’s Tutorial ? Because from what I know Druid is able to ingest only single stream by default, right?
Thank you ! Jakub