Querying cardinality of groupby result is usual query pattern in druid?

I’m using druid 0.12.0.

Here is query


select

count(distinct user_id)

from

(

select

user_id

from

data_source

where

event_date between '2018-12-01' and '2018-12-31'

group by

user_id

having

count(*) >= 2

) T

user_id is very high cardinality dimension. it’s almost more than 10 milion.

i wrote druid query like below.


{

"queryType": "groupBy",

"dataSource": {

"type": "query",

"query": {

"queryType": "groupBy",

"dataSource": "test-data-source",

"dimensions": [

"user_id"

],

"granularity": "all",

"intervals": [

"2018-12-01/2019-01-01"

],

"aggregations": [

{

"type": "longSum",

"fieldName": "count",

"name": "count"

}

],

"having": {

"type": "greaterThan",

"aggregation": "count",

"value": 1

}

}

},

"granularity": "all",

"intervals": [

"2018-12-01/2019-01-01"

],

"aggregations": [

{

"type": "cardinality",

"name": "totalCount",

"fields": [

"user_id"

],

"byRow": false

}

]

}

Is it usual query pattern for druid? I found that spark shows better performance. I used groupby query for druid even though druid usually suggest to avoid groupby query, i have no idea to another way.

It spends almost a minute, and I wonder it’s not bad speed in druid. What I mean is, is there anything should i tuning more? i tried tuning many parameters in historical node and broker node, but it didn’t make better.