We have a datasource that has one multi-value dimension named “Advertiser Domain”, it contains lists of domain names.
I group on this dimension and also filter on one domain, I get a list of all domains back that co-ocurred with the one selected via the filter.
(See screenshot below)
So far so good, but when I look at the measures, I can see that the totals are less than individual entries although there exist no negative values that could cause this behaviour (I reversed the sort order to assert the absence of any negative values).
In the screenshot I attached, the domain “verizon.com” has a “Served” count of 78.8k but the total number of “Served” count is only 80.43k.
I do understand that the individual measures overlap but how can an individual number for a single entry in a multi-value dimension be higher than the total sum.
To make sure that this behaviour is not caused by a bug in Pivot, I set it to debug mode and looked at the native Druid queries Pivot generates. I can see that both the timeseries query submitted for the totals and the topn query submitted for the individual breakdowns contain the exact same filter expression.
Is this a bug or am I misunderstanding how multi-value dimensions work?