I have bad news and good news.
The bad news: the solution I can offer is pretty clunky and inelegant.
The good news: there is work being done on multiple fronts to make this a lot better soon.
The (current) solution:
Manually run two queries, one to get the list of users and another to filter on that list and get the totals.
Query 1 ideally should be this (in PlyQL):
select user, count(distinct device) as devices_count from data group by user having devices_count > 1.5 and devices_count < 2.5
notice the ±0.5 error bound on the device_count this is to account for having approximate counts there that may not be integers.
BUT THAT WONT WORK! there is a bug in Druid where you can not have a having filter on the result of a HyperLogLog
To get around that you could do this (in PlyQL or equivalently in Druid):
select user, count(distinct device)-2 as devices_count_above_2 from data group by user having devices_count_above_2 > -0.5 and devices_count_above_2 < 0.5
No that you have a list of users the second query is straight forward.
This approach is still not optimal. You have to run two queries and you have to materialize what might be a huge list of users.
There are 3 areas which are getting fixed right now:
The having on HLL bug is being worked on.
There is talk of adding support for Bloom Filter based filters which would prevent you from having to materialize a huge list of users
I am working on a planner feature in Plywood/PlyQL to allow you to use a query as a filter input to another query.