Isn't Druid intersect aggregate operation possible?

Isn’t Druid intersect aggregate operation possible?

Hi, I’d like make intersect aggregate by using druid.

My logs are as follows

dt user category

2017091900 user1 1

2017091900 user2 2

2017091900 user1 2

2017091900 user2 3

I want to get results like the query below.

select count(distinct user)

from (

select user, count(*)

from user_log_table

where dt=‘2017091900’ and category in (“1”, “2”)

group by user

having (count(*) > 1)

)

by using below druid query, I can only get union count case.

{

“queryType”: “topN”,

“dataSource”: “user_log_table”,

“granularity”: “all”,

“dimensions”: “dt”,

“threshold”: 5,

“metric”: “total_usage”,

“aggregations” : [

{

“type” : “cardinality”,

“name” : “total_usage”,

“fields”: [ “user” ]

}

],

“intervals”: [

“2017-09-19T00:00:06+00:00/2017-09-20T00:00:06+00:00”

],

“filter”: { “type”: “or”, “fields”: [

{ “type”: “selector”, “dimension”: “category”, “value”: “1” },

{ “type”: “selector”, “dimension”: “category”, “value”: “2” }

]

}

}

e.g union case => count 2

dt user category

2017091900 user1 1

2017091900 user2 2

2017091900 user1 2

2017091900 user2 3

e.g intersect case => count 1

dt user category

2017091900 user1 1

2017091900 user2 2

2017091900 user1 2

2017091900 user2 3

Is there any other way I cant get intersect result using druid query?

I found answer by myself.
intersect count could be able to be extracted by using ‘DataSketches aggregator’.

http://druid.io/docs/latest/development/extensions-core/datasketches-aggregators.html

In my case, my query is shown below.

{

“queryType”: “groupBy”,

“dataSource”: “user_log_table”,

“granularity”: “all”,

“dimensions”: ,

“aggregations” : [

{

“type” : “filtered”,

“filter” : { “type”: “selector”, “dimension”: “category”, “value”: “1” },

“aggregator” : {

“type”: “thetaSketch”, “name”: “A_unique_users_1”, “fieldName”: “user”

}

},

{

“type” : “filtered”,

“filter” : { “type”: “selector”, “dimension”: “category”, “value”: “2” },

“aggregator” : {

“type”: “thetaSketch”, “name”: “A_unique_users_2”, “fieldName”: “user”

}

}

],

“postAggregations”: [

{

“type”: “thetaSketchEstimate”,

“name”: “final_unique_users”,

“field”:

{

“type”: “thetaSketchSetOp”,

“name”: “final_unique_users_sketch”,

“func”: “INTERSECT”,

“fields”: [

{

“type”: “fieldAccess”,

“fieldName”: “A_unique_users_1”

},

{

“type”: “fieldAccess”,

“fieldName”: “A_unique_users_2”

}

]

}

}

],

“intervals”: [

“2017-09-19T00:00:06+00:00/2017-09-20T00:00:06+00:00”

],

“filter”: { “type”: “or”, “fields”: [

{ “type”: “selector”, “dimension”: “category”, “value”: “1” },

{ “type”: “selector”, “dimension”: “category”, “value”: “2” }

]

}

}

2017년 9월 21일 목요일 오전 11시 7분 57초 UTC+9, GyoungJin Gim 님의 말: