Hi Jaebin, what docs were wrong, are you able to submit a PR to fix them?
Regarding returning two “dimensions” in a topN (raw dimension and lookup dimension) that is not currently possible. It is possible to filter on the raw ID and use a lookup for the topN dimension that is returned.
So the question now becomes how do you get the desired association between the “id” and the “name” without having to know about the raw data.
Unfortunately the dataset you’ve described is a non-injective mapping you are wanting to treat as injective. So I see one of two solution groups:
- Make it injective
- If you are able to breakup the namespaced lookup into multiple injective lookups (I don’t know if this is possible with your dataset) that might solve your issues.
- You have the option of adding fun things like non-printing UTF-8 characters (like the zero-width-space) to the "name"s to make then unique if they only have to “look” the same.
- Find some sort of distinguishing characteristic for which you can guarantee injective properties. Example: If you know that IDs map to unique names within a particular metro zone, then any particular query that only covers one metro zone is injective and can use the big lookup with the raw name. For queries which cover multiple metro zones you can tell to use a different lookup definition, maybe one that pulls from a table (or view) where the metro zone is appended to the name.
- Change the query
- Do a group-by query, which will be much slower but will return any number of dimensions you want.
- Get the IDs back and handle the mapping in some sort of second query (either by querying the DB or issuing some kind of second query to Druid)… not really ideal
IMHO being able to return the original dimension value with the extractionFn modification would be a neat feature, but is not currently implemented.
Sorry I don’t have a good out-of-the-box solution, and those are just some initial thoughts.