Uncorrelated subquery in druid

Hi,

I want to know whether it is possible to query tracking user who purchased on a certain day.

ex)

select dt, count(distinct userid) as uv, count(*) as pv

from access_log

where dt between ‘2016-08-02’ and ‘2016-08-07’

and userid in

(

select userid

from order

where  dt = '2016-08-01'

)

group by dt

How to query this? Is it possible in Druid?

Thanks,

Jay

Hey Jay,

You could do this in a couple of ways:

  1. Exact: two queries, one for the inner query to get a list of users, and then the second query would use that list as an “in” filter.

  2. Approximate: one query using thetaSketch intersections. This would be possible in the next release (Druid 0.9.2) using the new time filtered aggregations feature. There’s an example in the docs for that upcoming version under “retention analysis example”. It hasn’t been released yet, but the docs are on github: https://github.com/druid-io/druid/blob/master/docs/content/development/extensions-core/datasketches-aggregators.md#retention-analysis-example

The thetaSketch method looks really powerful - could that produce a full retention set?
E.g. if (in the retention query example) the granularity was set to DAY, with the various intervals all set to cover 7 days, would the postagg generate a set of 7 x 7 intersection results?

Hi Richard, we are working on making retention queries easier. The 0.9.2 release should help with that.

See: https://github.com/druid-io/druid/pull/3315

Hi Fangjin.

When will 0.9.2 be released?

Thanks

We are aiming for RC end of this month.