Curious to see if anyone has tackled something similar when using Druid.
To be prevent being to verbose I have the following flattened data spec being ingested by Druid.
The sessionID is always defined as it is the Druid short live session ID. In every segment the sessionId is available.
In some cases I have a null skuId. I only track a skuId when a user visits a product detail page.
I am trying to determine if Druid is suited for the below task.
I would like to query into Druid when a user hits a product detail page. Lets say the SKU value is 12345. During the remainder of the session they go to the following product detail pages.
skuId 54321, skuId 32145 and skuId 55555.
For all of the data events the person has the same sessionID. The data is written out to the spec with the sessionID and the skuId.
So in this case I have committed the following data:
sessionID - 0::Aasdas412asdasd123asdasd (made up ID)
skuId [54321, 32145, 55555, 12345]
I have another user come to the site and performs similar actions. In some cases goes to the same skuID as the other user but in some cases does not.
sessionID - 0::asd234sdfssaghhddafcd (made up ID)
skuId [54321, 666666, 55555, 897545]
I would like the ability in real time to perform an aggregation to build a recommendation query to state:
“All users who viewed this sku (54321) have also viewed these other sku’s” Essentially returning a weight(count) all skuIds that were viewed in all sessions that contained the datapoint sku value of 54321.
My thought was to query all of the sessionIds that have the skuID 54321. After that iterate over all sessionIds and grab all other skus that were viewed in those sessions. Take the count of each sku and show in order the skus that were reviewed during the session.
I have been looking at the datasketches library but that still seems to be slightly off with what I am trying.
I have tried a query of queries in Druid SQL but dynamic values don’t seem to be possible. (filtering on returned sessionIds)
I have tried filtered aggregations but cannot figure out how to grab an array of sessionIds and loop over to build a list of all ids.
Any help would be greatly appreciated.