I have a processing platform that generates ‘N’ categories of metrics each in its own location in HDFS. I would like to ingest each of these categories into Druid. I see one way of doing this
Run ingestion query for each category into its own datasource in druid. Hence category1 available in XYZ/category1/YYYY/MM/DD goes into category1 (datasource) into druid , category2 available in XYZ/category2/YYYY/MM/DD goes into category2 (datasource) into druid .
As each category is residing in its own datasource i need to run queries against each datasource.
If there is a change in schema for any of category i just ingest with modified schema and then i can query with changed schema.
If historical data changes (in HDFS) for some reason, i again run re-ingestion for those time periods and druid will simply create and overwrite new druid segments for those time periods.
Do you think there is better way of doing it ?
And is my assumptions correct ?