Is it better to aggregate multiple rows into one row ?

Currently I’m monitoring several stats of 10K host, each host reports its stats every 10s.

Segments are split by hour, each segment contains about 600 * 10k * 10 = 60M rows.

Our system always query a stats of several host or several stats of a host of a range of time.

If I aggregate 600 samples of a stats in hour to one row, the number of rows will reduced by 600 times.

I don’t understand how druid store the metrics, if rows are aggregated, will the query boosts significantly ? (reduce IOs and delay ?)

Hi Zang,
Druid has a concept of rollup where it can be configured to summarize multiple events into a single row based on the unique combination of dimensions.

Refer - rollup for more details.

Fwiw, If In your case all you care about is to view your stats aggregated by each minute. you can configure queryGranularity to MINUTE in your ingestion spec.

Roll-up doesn’t fit my requirement. I don’t want to lose precision, at least I need one sample per 10 seconds.

By aggregation, I mean store 600 points of data into a continue array, which can reduce IOs, the progress doesn’t lose any data, only store data more compact. For example:

Time Host Type Value

0s A CPU 10

0s B CPU 20

0s A NET 10

0s B NET 10

10s A CPU 20

3600s A CPU 50

I’ll store them like:

Time Host Type Values

0s A CPU [10, 20, … (array of 600 points) … 50]

0s A NET [10, 20, … (array of 600 points) … 50]

0s B CPU [10, 20, … (array of 600 points) … 50]
0s B NET [10, 20, … (array of 600 points) … 50]