Query for path usecase


We have the following usecase which we are having problem realising in Druid.

We have a datasource called store_visits with the dimensions __time, user_id and store_id.

We want to be able to answer the question:

“How many times users transitioned from store “A” to store “B” (in that order and no matter if there were intermediary store visits)”

So for example if the user path was “A->B->C->A->D->B”, we would have two transitions (“A->B” and “A->D->B”).

The close I got to a solution was to use a groupBy query on user_id, and then add a Javascript aggregator for concatenating the store_id in a comma separated string value, then finally applying a Javascript postAggregator with a regex to count the transitions.

However, the Javascript aggregator expects to return a float point type (string not supported), so this approach fails.

Any ideas are appreciated.


Hey Joarley,

What you are asking for is an ‘ordered funnel’ sort of analysis. If you wanted ‘unordered’ I would have asked you to check out Theta Sketches, which are great for that kind of thing. But you want ordered, so never mind that.

I think the basic idea you have makes sense, and would work okay, but for the reasons you mentioned it wouldn’t work with the JavaScript aggregators. We’ve been adding array functions recently that are meant to be able to handle this kind of thing natively, although they haven’t been released yet, and I think an array aggregator hasn’t been developed yet, which you would need to complete the picture.

For the time being I’d suggest writing a Druid extension. If you write an aggregator as an extension, you can define any sort of behavior and intermediary data format that you want. For examples, check out: https://github.com/implydata/druid-example-extension

Hope this helps, and of course, stay tuned for future Druid releases that include more array functions!!