Merging Event Data

Hi
I am trying to understand how would we achieve the following functionality in druid.

A data record would consist of two parts. The first part will contain 90% of the data for that request and the second part would contain some activity data.

For eg: the first part would contain a user’s unique id along with his demographic and other information like browser (and derived data) , ip address (and derived data) etch

the second part would contain the id’s of all pages visited by that user over a session.

So on the start of the session we will be sending a session_start even with all the demographic data.

On subsequent clicks, we will just send the user’s session id and the page id.

We want to store the data into druid so that we could run timeseries queries on all this data.

Example query would be

  1. show total visits in last 24 hours shown hourly

  2. Show total visits for a particular page

  3. Similarly show total visits for a particular os etc

How would we go about sending this data to druid/tranquility so that it may combine the records before storing in druid!

Take a look at delta ingestion: http://druid.io/docs/latest/ingestion/batch-ingestion.html (search for “delta ingestion”)

Hi
The delta features mentions merging data for a segment.

How do i tell druid to merge two records based on a unique request id.

for Such a merge to happen, will we not need a way to specify the field on which to join the two records.

Hi Sanket, if you need to do a join of two records, you should take a look at doing this logic in a stream processor and sending the output of the stream processor into Druid.