Correlative subquery-like with Druid

I am currently facing an issue with a specific query and I think what I want is some kind of a “correlative subquery” (as in classic SQL), but I don’t think Druid can do that, in fact I think it cannot do that because it would not be really efficient, am I wrong ?

What I want to do is grouping and counting events by time (for each type of event) with say, a “day” granularity.

Each event has a userId associated with it and a type.

I’d like to filter the data for the first query and only keep events IFF another event with the same specific userId (but with another type) exists.

Take this events list:

{ “batchId”: 1, “type”: “TYPE_A”, “timestamp”: “2017-02-22T10:00:00Z” }

{ “batchId”: 1, “type”: “TYPE_B”, “timestamp”: “2017-02-22T12:00:00Z” }

{ “batchId”: 1, “type”: “TYPE_C”, “timestamp”: “2017-02-22T14:00:00Z” }

{ “batchId”: 2, “type”: “TYPE_B”, “timestamp”: “2017-02-22T12:00:00Z” }

{ “batchId”: 2, “type”: “TYPE_C”, “timestamp”: “2017-02-22T14:00:00Z” }

I’d like to group and count by hour, the number of TYPE_B and TYPE_C events (regardless of the userId), but only include events if a TYPE_A event exists, in this case, only batchId 1 would be used and the final results would be something like:


{ “timestamp” : “2017-02-22T12:00:00”, “event”: { “typeB_Count”: 1 } },

{ “timestamp” : “2017-02-22T14:00:00”, “event”: { “typeC_Count”: 1 } }


So, what do you think of this please ? Is there any way of doing it efficiently ? or maybe the way I store events initially is wrong ?

How would you do it ?

Thanks in advance !