I have been experimenting a bit with Druid and I have the following question: in a query, is it possible to specify a “granularity aggregator” (on the time dimension) that would be different from the aggregator ran against other dimensions?
For instance, let’s say I have 2 servers, each one with a certain number of active connections to another system. This number of active connections reflects a state at a given time and can go up and down. Every minute, each server sends its number. In Druid I have this data (server is a dimension):
11:03 server=server1 12
11:03 server=server2 42
11:04 server=server1 24
11:04 server=server2 33
11:05 server=server1 19
11:05 server=server2 31
Let’s say I want to draw a graph showing the maximum number of active connections, cross servers, with a “hour” granularity (just an example, could be “15min”, “day”, …).
If I run the query with “hour” as granularity and “sum” as the aggregator, it gives me: sum(12,42,24,33,19,31) = 161, which doesn’t mean anything in my case (no sense to sum different values from the same server).
Instead I would like to issue a query with
“hour” as the granularity
“max” as the aggregator against the time dimension
“sum” as the aggregator against other dimensions (server here)
The query would do:
- Apply the granularity aggregator (max) on the granularity period
11:00 server=server1 24
11:00 server=server2 42
- Apply the dimensions aggregator (sum)
–> And thus I would get 66.
Does it make sense? Do you know how I could achieve this?
Of course I could do it “manually” by running a groupBy query on all dimensions with the minute granularity and aggregating myself with post-processing but it means transferring a lot of data to the caller (+ probably performance impacts).
To illustrate differently, it seems that OpenTSDB calls this “downsampling”: http://opentsdb.net/docs/build/html/user_guide/query/index.html#downsampling