Aggregate metric strategies

We have a spec that contains periodic information about the state of our service instances. As a quick example, one of the things it tracks is disk usage:

{ “service”:“test-service”, “diskUsage”:28479122 }

I’d like to do a query that shows total disk usage across all services. Sounds like I need a longSum metric in my spec? Well, not exactly. The problem that I have is representing this data accurately in aggregate across different granularities. Let’s say that we emit this info every minute for all services. If I want to show the 15 minute total, it should logically be an average. I should divide the sum of all diskUsage fields by the number of reports made (15).

But wait! What if something happens and one of the emissions fails so there are only 14 reports made in a 15 minutes interval? Now the values are going to be 1/15 smaller than they should be. Is there a way I can generate/store this data differently so I can report on total usage?

Hit Taylor,

to me it sounds like you need to decide how to handle this case, when ingesting the data into druid. For example, if the request for ‘diskUsage’ has a timeout, equal to an outtake, than consider ingest as default the max disk size. By doing so you ensure that you have always the correct data points for your query.

With best regards

Martin