Problem to create complex query


I’m very new to Druid, but I’ve been reading a lot about it and after some tests I find it more than interesting.

Currently I’m trying to query a data set to see if I can accomplish what I need, but I can’t find how to do it.

I’m using the wikiticker database that is present on Druid’s Quickstart tutorial on their webpage ( For the example purpose, let’s use just two dimensions from that database: user and page.

What I need to do is to group users and generate a field like ‘edited_pages’ which should be a multiple values field (with all the edited URLs), and then filter by that new field.

For example: users that edited (page A and page B and not page C) or page D .

I could use the HAVING clause but since this query is generated dynamically in a user front-end, I think it’s more difficult to generate it programmatically.

Is this possible using Druid queries?

Thanks a lot !

Hi Federico, what is the SQL query you are trying to use?

FWIW, you may consider using

Hi Fangjin Yang! That was really quick :slight_smile:

Currently in SQL there is no query that could satisfy this directly. But I could join on the user id for as many conditions as I want for the same user and then make the above condition on the WHERE clause using those joins.

I will make myself a little more clear about the problem, here is the dataset of users and browsed pages:

user / page

user1 / A

user2 / B

user1 / B

user3 / C

user4 / C

user2 / A

user5 / A

user5 / C

user2 / C

So for example, I want to know which users went to page A and page B but not page C.

This query should return only user1. How can I achieve this with Druid’s current capabilities?

Thanks for the link about plysql, I will look into it.

Hi Frederico, I believe your use case is solved by the datasketches module in Druid: (see examples)

Yes Fangjin Yang! That seems to be the solution :).

Thank you very much !