Druid datasource schema design inputs

-I am exploring best practices around designing the data source for Druid.

  • Working on building the schema/data source for holding the test results for a manufacturing product which goes through multiple level of various testing.

  • Each testing will happen at different interval of time and we get real time data of test results for each of the product which are uniquely defined by serial_num

  • I get various test results from each of the test station with the start_time and end_time of the test conducted.

  • I am trying to use the start_time of the test for segmenting the data in Druid datasource.

  • Most of the test results are of decimal, double data type and the final result column providing the info on whether product testing is success or fail.

  • I can insert all the individual test results from test station to Druid.

  • But most of my query capabilities on Druid data source is to get the flattened view of all the test results across different test stations for a given product/serial_num.

  • I was planning to use all the test results as metric column and key columns like serial_num , test station and couple of other columns as dimensional attributes.

  • Use the Group by query type with the filter on the interested serial_num and apply aggregation of max on all the metrics across different test stations (may be present on different druid segments) to get the combined view of all the test results.

Need help in providing any valuable inputs in designing the data model for this scenario keeping the performance in mind.
Is the above approach good enough for designing the druid data source schema?

Thanks,
Vish