We have been using druid for about 8 months now and are feeling pretty comfortable with it. Our data source has two dimensions segment_id and intersected_segment_id with one metric count.
The metric count is represented by the tuple of date, segment_id and intersected_segment_id.
Currently if there are 10 segment_ids and 10 intersected_segment_id for a given day we would have about 100 Rows. In reality we have > 100,000 segments and 13k intersected_segment_id. Hence the number of rows are really high for a given day.
If you notice most of the data above is redundant for a given day and rows can be reduced to (number of segment_id).
Ideally we would prefer something like multi-value dimension with corresponding multi-value metric. However, I did not find any such feature or feature request existing currently.
I have briefly looked at druid’s code and thought it maybe to better to get some opinions from the community.
My main question around these are
Would these be a feasible feature?
What would be good starting place? Some pointers to code will definitely help.
Happy to clarify anything if required.