The advantage of separate datasources is that it makes re-indexing simpler (since you can do one at a time instead of having to do all at once) and that it improves isolation between customers during realtime ingestion (so one customer sending excessive data won’t impact others). The disadvantage is that each datasource will need at minimum one realtime task, and each of those tasks is a fixed bundle of resources that generally includes at least one CPU and a couple GB of memory.
I think that in general, if you mostly have larger datasets per customer (more than a billion events per month) then it makes sense to have separate datasources. If you mostly have less than a billion events/month/customer, then for efficiency’s sake during realtime ingestion it makes sense to combine them into some smaller number of datasources (possibly one).
To answer your other two questions,
a) Yes, new datasources are created automatically. Just asking Druid to ingest into a new datasource is enough to create it.
b) Yes, if you use tranquility then you can send data to as many datasource as you want from the same Bolt. Keep in mind that since each datasource has its own tasks, each datasource will require at least one outgoing TCP connection (one per task).