Assume I have 2 schemas that need to be stored in Druid and rolled up.
Dimensions: D1, D2, D3
Metrics: M1, M2, M3
Dimensions: D2, D3, D4
Metrics: M2, M3, M4
One way of organizing this is to put each schema in its own datasource.
Schema 1 in Datasource1
Schema 2 in Datasource2
Question 1: Does it make sense to store both schemas in a single datasource ?
And have an identifier dimension say “D0” that might hold a flag with value “S1” or “S2” to denote that it belongs to that schema/datatype.
Dimensions: D0, D1, D2, D3, D4
Metrics: M1, M2, M3, M4
Question 2: Will it affect performance or roll-ups (considering some dimensions/metrics might be null or empty when storing other schemas) or disk space?
Question 3: Does Druid have a limit/best-practice on the number of dimensions or metrics we can have in a single datasource ? Is it ok to run into the thousands ?