Is there a way to do theta sketch set operation on more than one data source? Say one dataset containing ( product, userid) and another dataset containing ( region, userid) , and I would want to answer a question like how many unique users use product A and belong to region B>
select THETA_SKETCH_ESTIMATE(THETA_SKETCH_INTERSECT(t1,t2)) from (select DS_THETA(userid) filter (where region=‘regionB’) t1,
DS_THETA(userid) filter (where product=‘productA’) t2 from datasource)
Hi Payel -
I was able to create the two sketches in one query (one for users by product, one for users by region, from different datasets), but got errors about “unable to create plan for query” when I tried to intersect them. Not sure if it’s me, or incomplete support for the operations.
But you might be able to do this kind of query in any case. I suggest going to the druid console and using druid SQL, eg,
SELECT COUNT(DISTINCT userid)
FROM table1 t1 JOIN table2 t2
ON t1.userid = t2.userid
WHERE t1.product = ‘some_product’
AND t2.region = ‘some_region’
It probably won’t use sketches, but it should give an estimate (or exact, if you turn off the “use approximate” options), unless maybe it’s too many rows to join or something that gives an error.