Re: [druid-user] does anyone know how to calculate derivate from cumulative data?

I think there’s a way to do this in queries, but I’m not sure if it’s a good idea at scale - it uses a join (subquery), which can eat up heap to process, depending on the number of dimensions and rows. (Basically, you’ll have to pull the right-hand table into memory to do the join). But…

You might be able to use a join and timestamp_shift in druid queries, or time_shift in druid sql queries.
Here’s an example in druid sql:

I made a small datasource, “inline_data”, with three rows, minute granularity, one dimension and one metric, that goes from 1 to 3 to 7.

Here’s a sample query that first does a subquery, shifting __time up one minute, to see what the next interval’s time will be, and then joins on that and does the diff:

SELECT

t1.__time,

t1.dim1,

t1.metric1 as metric_now,

t0.metric1 as metric_prev,

t1.metric1 - t0.metric1 as diff

FROM inline_data AS t1

JOIN

(SELECT

TIME_SHIFT(__time,‘PT1M’,1) as next_time,

dim1,

metric1

FROM inline_data) AS t0

ON t1.dim1 = t0.dim1

AND t1.__time = t0.next_time

(I selected “metric_now” and “metric_prev” just to show the original metrics, too.) Here’s the output.

So I get 2 rows, with deltas (“diff”) for the three original rows.

I expect the same could be done using timestamp_shift in a native query. However, again, I still don’t know whether it’s a good idea, since it involves a join, and depending on your query, memory usage might get huge.

Sounds good - let us know how it goes!