Event-driven query approach

Hi all,

We are building a distributed system that uses extensive logging to capture events, and will be using the logs to determine certain events have occurred.

I have been tasked with creating a query that will review logs that contain the following data

  • when a user logs on

  • when they start a piece of work

  • when they finish a piece of work

  • when they start the next piece of work

  • etc etc

  • when the log off

The idea is to calculate the “down” times between pieces of work. Assume that each of the above events will be captured as a separate log item.

What is the best way to approach this problem using the standard Druid features ?

Thanks in advance !

Mikaere

Hi Mikaere, please see inline.

Hi all,

We are building a distributed system that uses extensive logging to capture events, and will be using the logs to determine certain events have occurred.

I have been tasked with creating a query that will review logs that contain the following data

  • when a user logs on
  • when they start a piece of work
  • when they finish a piece of work
  • when they start the next piece of work
  • etc etc
  • when the log off

The idea is to calculate the “down” times between pieces of work. Assume that each of the above events will be captured as a separate log item.

I was just wondering if you were to issue the queries you were planning in SQL, what would those queries look like? FWIW, Druid currently only supports single table (datasources) operations, and data should be denormalized before it is loaded into the system. Druid also does not support joins, although some client libraries have added limited support for this feature (e.g. https://github.com/srikalyc/Sql4D). It sounds like your use case is for funnel analysis and may require some work to be done at the ETL layer before the data is loaded into Druid.

Thanks for the info. I didn’t think Druid natively supported this, but since I am a n00b, I thought I would double-check :slight_smile:

Cheers !