Computing session length

Hi Folks,

Quick question. We’re using Druid for analytics (like most people!) and we track individual requests.

However, where I’m a little lost is how we could work out the session length from the individual requests? We do have a session ID which we can filter on, but I need to figure out:

  1. How to compute the session length (End Time - Start Time, for a given filter)
  2. How to compute the average session length using all the the data in Druid
    Hopefully that makes sense :slight_smile:

Thanks in advance,

John-Daniel Trask

I think you should look into min/max aggregators and then post aggregation.
There is no much doc on it sadly, so i guess you should wait for other comments maybe :confused:

It looks like what you want to do will need three stages - getting the start/end times together, computing the session length per session, then aggregating the results to find the mean. You need to run the first two steps in a subquery (as Benjamin says) aggregating to find the min/max grouped by sessionId, then postAggregated to get max-min.

That subquery is then the datasource for an outer query which aggregates your session lengths to find a mean.

Big problem with this: your subquery results are limited to (iirc) 500k rows, which can be easy to exceed - especially for a session-based subquery.

Personally, I would preprocess your data to find session lengths and add them as a metric (I have a similar use case, and will be doing this) - this will remove a reason to keep the unique events in Druid. You really don’t want to store the row-level data in Druid unless absolutely necessary, as it will increase the storage use, and really hamper performance…

The subquery results limit will go away with 0.9.2 with a completely rewritten groupBy engine.