Store and filter high cardinality dimension?


I have a high cardinality dimension (IP). In my case almost every IP will be unique. Now I would like to filter on ip ranges (for example

The most basic solution would be to store it as a normal dimension, but from what I understand this is not optimal for druid… Can I do something smart, for example using using a ThetaSketch? It would be ok for me if the filtering is not exact.
(I already have a ThetaSketch to count the number of unique IPs)


Theta sketch might work, but have you thought about storing the range it self as a dimension like “ip_base” / “net_mask” then use a business logic layer to express the desired ranges ?

Sorry, maybe I wasnt clear in the first message. The filtering needs to work dynamically so that I can pick any range to filter on when I do my druid query.


I don’t think there’s native support for it yet, but such a filter on an integer dimension would be a nice reference implementation for numeric dimension handling. Would you mind filing a github issue describing what kind of query you want to do and how you intend to use the results?