I am looking for some advice related to how to store & query data.
I am using druid for tracking the statuses of machines in a data centre. The status of a machine can be created, up, down and decommissioned. For this data, I have to run queries such as the following:
Number of machines which are active (not decommissioned) right now
Number of machines which are up right now
weekly/monthly time-series of created/up/down/decommissioned machines
weekly/monthly time-series of active machines
Based on the above queries, the right way to store the data seems to be to store the events as they occurred - as opposed to single machine records.
So something like this:
__time, machine_id, status
As opposed to:
__time, machine_id, created_at, decommisioned_at, up_at, down_at
The first schema might result in slightly more complex queries. However, the second schema loses the history of machines i.e. timestamps of multiple up & down events are lost. Also in druid, it doesn’t seem possible to update a single record. So when a machine goes down, I have no way of finding a single record and just updating a value on that record. Any thoughts?
My second question is on a particular type of query: the number of machines active in a month. So I have created_at and decommisioned_at timestamps. Given this time difference, I need to create a per-month metric. I found examples of SQL queries for how to do that. However, it seems like a common enough requirement - find monthly active X. And I am wondering if there are any functions that druid provides out of the box to generate this type of time series.