I am using druid to track analytics events for a bunch of web sites. I have been asked to calculate the bounce rate for every domain in a particular period, i.e. the percentage of users that navigate away after viewing only one page.
At the time being at every visit an event is sent to druid with the following info (among others) :
Can you recommend some strategy to calculate this particular metric?
If the timestamp can tell when a unique user hit the domain then closed the page (assuming every event you get is just hitting the first page), then that tells you how long the user stayed on that page. You have to define your threshold of how long a user should stay to determine if it’s a bounce or not. Also, there should be only a pair of events for every unique user - time in and time out to that single page.
If you take the difference between two timestamps and if it’s within MIN/MAX range (meaning user didn’t bounce), then there’s no bounce. Otherwise, that’s a bounce. You would need a custom dimension that would hold “bounce” flag (true/false). You can use transforms and simple div math to determine its value.
I was wondering if there is a way to ask to druid how many values
of session_id have a count of exactly 1 in the given time range.
Given the time interval, if I make a groupBy by domain,
session_id I get for every domain the list of the session_ids and
for each session_id the number of pages visited.
I could then discard all the session_ids that have visited more
than one page and count the remaining ones, and this is the number
of users that visited only one page.
Then I should do another query, counting for every domain the
distinct count of session_ids. The ratio between the first number
an this one should be the bounce rate.
This method is obviously not feasible for large data sets. Is
there a less cumbersome way to calculate the number of session_id
which are present only one time in the given interval?