Is it possible to specify the aggregation used in the roll-up in the time dimension as different to that used in the field aggregations across other dimensions?
For example, imagine we have data stored from smart electricity meters. To keep it simple, let’s have a single dimension of meter ID, and a single metric of power usage (rate of using electricity, not the amount used). Let’s assume that we store a reading every minute, and there is no roll-up on ingestion, so query granularity on ingestion is also one minute. Now I want to do a query showing total power usage over all meters, with say one hour granularity. So we want a doubleSum aggregator to aggregate values from the meters, grouped by meter id say. However, within each hour, summing the power usage data points of each meter is meaningless - we really want something representing a “single” usage value, like average, first, last or maximum, for it to have any meaning.
Is there a way to do this in Druid? If so, how?
Have you taken a look at the aggregations documentation (aggregations at query time do not have to be the same aggregations that were used at ingest time). http://druid.io/docs/latest/querying/aggregations.html
You can define straight-forward first/last, min/max directly on your power usage metric. You could calculate an average using a sum aggregation and a count aggregation as separate query fields, then perform the sum / count calculation as a post-aggregation.
Thanks for the reply. Yes I understand all that, and in fact have used post aggregations successfully to create a mean. The problem is that I want the aggregation which happens at query time for the intervals defined by the time granularity to be different to the aggregation which happens for the dimension. My example gives a use case where this is required. So within the hour, for each meter, the power readings must be averaged, then the averages must be summed for all meters, to give a total hourly average power consumption for all meters.
I can’t see any way to do this. Is it something which Druid supports?
Ah, your use case is more clear now. Perhaps check out how to use nested groupBy. Documentation appears to be a little scarce - I think the Druid SQL documentation has the most information. You could try to figure out how to do a nested SQL query and then get the json representation for it.
Thanks Kyle, that gives me some pointers - I’ll have a look.