We lose the ability to query individual events, what does it mean?

In the docs, I found this:

"In practice, we see that rolling up data can dramatically reduce the size of data that needs to be stored (up to a factor of 100). Druid will roll up data as it is ingested to minimize the amount of raw data that needs to be stored. This storage reduction does come at a cost; as we roll up data, we lose the ability to query individual events. "

what does it mean?

Hi xuzhe,

The section following that excerpt goes into more detail:

" Phrased another way, the rollup granularity is the minimum granularity you will be able to explore data at and events are floored to this granularity. Hence, Druid ingestion specs define this granularity as the queryGranularity of the data. The lowest supported queryGranularity is millisecond."

It means that Druid will aggregate the metrics columns for each unique timestamp-dimension tuple and store a single row for each tuple, instead of the individual raw events.

e.g.,

if you had two rows:

{time: 123, dimA: “hello”, metricA: 5}

{time: 123, dimA: “hello”, metricA: 15}

with a sum defined on metricA,

druid will store the row:

{time: 123, dimA: “hello”, metricA: 20}

  • Jon

Yep, I read the docs again and again and I got the point finally, thanks for your answer.

在 2016年6月15日星期三 UTC+8上午5:02:32,Jonathan Wei写道:

FWIW, this behavior will change with https://github.com/druid-io/druid/pull/3020