Our company currently uses the Codahale metrics libraries for our application metrics, e.g. timers (histogram and meter), gauges, etc. These metrics are retrieved on a regular basis and are then streamed to InfluxDB via a Kafka topic. The Codahale metrics are already pre-calculated, e.g. the percentiles, averages, min and max values, etc. so no further action is required other than to just store them in InfluxDB. Each row already contains all these values in individual columns, e.g. 50th percentile, 98th percentile, 1 minute rate, 5 minute rate, etc.
We are already using Druid for storing product usage information but are investigating whether or not we can also store this metrics data in Druid as well. This will mean that InfluxDB is no longer required in our solution.
The issue I have run into however is that Druid seems to be based on raw values and applies aggregation to them. For example rather than taking an already pre-calculated 50th percentile value the metrics section of the ingestion spec requires this to instead have a aggregation type of doubleSum, min, average etc. I am just wanting to store the values as they are ingested as no action (calculation or aggregation) is required for this data. I have upgraded to Druid 0.9.2 so that I can disable the rollup and I am just wanting each event with no aggregation. At query time we would just be pulling back that same raw data and plotting the timeseries for each value for that timestamp.
Is it possible to achieve what I’m looking for, or am I misunderstanding how this should be used? Should every metric value consumed be specified as being an average in the ingestion spec for cases where due to the query granularity there may be more than a single row returned within that interval? Is this not really the intended purpose of Druid and would be better suited for a different time series database, i.e. we remain using InfluxDB for this?