My use case is
Is there an optimal way to achieve that without resorting to 2 queries. Approximates will work.
This might be a use case not fit for druid, but I guess no harm asking.
Let us say that we looking at last 14 days of data. The use case looks like this
All users that have the attribute in the 14th day ( as in 14 days ago ), should reflect the aggregation from 7-14 time frame ( 7 days )
All users that have the attribute in the 13th day ( as in 13 days ago ), should reflect the aggregation from 6-13 time frame ( 7 days )
- All users that have the attribute in the 8th day ( as in 8 days ago ), should reflect the aggregation from 1-7 time frame. ( 7 days )
And each input row knows when the event occurred, as in it knows that the event occurred n days ago.
And we need to get the aggregate of the measure for the 7-14 interval. Each measure is an aggregate of the leading 7 days.
The way I think it could be done is to have 14 dimensions
where offset represents that bucket that the raw row has to be pushed in based on when the attribute appeared. Thus if druid gets a row today with an event for offset_7 as in the event occurred7 days ago, the measure is put in that bucket ( offset_7, rest being nulls/0 ).
In the post aggregations for a target day we look at the aggregates for each bucket ( partitioned on time of course ) for the relative days and aggregate.
For example is the query is for today, the interval is (7-14) days ago, and we aggregate up the offsets ( 1-7 ) for that interval.
Does that sound feasible. I could not come up with any other way.