[druid-user] Schema & query for tracking machine status

Hello,

I am looking for some advice related to how to store & query data.

I am using druid for tracking the statuses of machines in a data centre. The status of a machine can be created, up, down and decommissioned. For this data, I have to run queries such as the following:

  • Number of machines which are active (not decommissioned) right now

  • Number of machines which are up right now

  • weekly/monthly time-series of created/up/down/decommissioned machines

  • weekly/monthly time-series of active machines

Based on the above queries, the right way to store the data seems to be to store the events as they occurred - as opposed to single machine records.
So something like this:
__time, machine_id, status

As opposed to:
__time, machine_id, created_at, decommisioned_at, up_at, down_at

The first schema might result in slightly more complex queries. However, the second schema loses the history of machines i.e. timestamps of multiple up & down events are lost. Also in druid, it doesn’t seem possible to update a single record. So when a machine goes down, I have no way of finding a single record and just updating a value on that record. Any thoughts?

My second question is on a particular type of query: the number of machines active in a month. So I have created_at and decommisioned_at timestamps. Given this time difference, I need to create a per-month metric. I found examples of SQL queries for how to do that. However, it seems like a common enough requirement - find monthly active X. And I am wondering if there are any functions that druid provides out of the box to generate this type of time series.

Thanks!
A.

Hi A.,

In answer to your first question, you can update existing data in Druid. For your particular case, lookups might be the answer.

For your second question, can you share the SQL examples you found? They’re probably translatable to Druid SQL.

Best,

Mark

Thanks, Mark.

For the second Q, I am trying to come up with a query which will generate a list of months for me.

I tried this:
with cte as
(
select TIMESTAMPADD(DAY, 1, “first_month”) as ym from (
select FLOOR(“created_at” TO MONTH) as “first_month” from
(
select min("__time") as “created_at” from “machine_events”
)
)
UNION ALL
SELECT TIMESTAMPADD(MONTH, 1, ym)
FROM cte
WHERE TIMESTAMPADD(MONTH, 1, ym) <= CURRENT_TIMESTAMP
)
select * from cte

However, this does not work because cte is not found in the select after union all.

I also tried this:

select TIMESTAMPADD(DAY, 1, “first_month”) as ym from (
select FLOOR(“created_at” TO MONTH) as “first_month” from
(
select min("__time") as “created_at” from “machine_events”
)
)
UNION ALL
(
SELECT TIMESTAMPADD(MONTH, 1, “ym”)
FROM (
select TIMESTAMPADD(DAY, 1, “first_month”) as ym from (
select FLOOR(“created_at” TO MONTH) as “first_month” from
(
select min("__time") as “created_at” from “machine_events”
)
)
)
WHERE TIMESTAMPADD(MONTH, 1, ym) <= CURRENT_TIMESTAMP
)

However, this only generates two months in the result - instead of a series of months.

Any suggestions on how to tackle this?