Pros and Cons of multiple data sources in Druid


In one of the designs we came up for our requirement, we have to create 15 data sources and ingest batch data (Hadoop batch ingestion). We might see maximum one lakh records ingested per day into these 14 data sources. Our use case is such that the data is varied and merging it to lesser number of data sources is too much strain.

There is a concern of having 15 data sources since it might consume more computational memory( in terms of JVM etc) and put a strain on the existing druid cluster. Can anybody please let us know whether the multiple data sources cause issues from your production experience? I have read blogs where there are 100s of data sources being used in some larger production deployments but looking to get more detail on it.

The help is very much appreciated.

Thanks very much,

Panner Selvam Velmyl.