Aggregations on Druid

Hello,

I’m new on Druid and i have a lot of questions about aggregations, i can’t understand how it works.

Aggregation at ingestion time is only create by “metricsSpec” field ? If not, how can we use “Aggregations” Field ?

My typical example is I want to have the number of lines by Country, is it smart to try aggregate this at ingestion time and is it possible ?

When we aggregate before a query or during a query, is the aggregation store somewhere? If not, why aggregate ?

Thanks,

Ben

Hi Ben,

Have you take a look at : http://druid.io/docs/0.9.0/design/

At ingestion time, you define how your data rolls up, which is the minimum granularity you can group data on later on for queries.

Let’s say you defined a rollup on minute. On the query side, this means you can bucket results arbitrarily with a minimum granularity of minute, but you can also group on hour, day, week, etc.

Thanks for your answer !
I have a lot of answers due to my experience (even if it’s new), but a lot more of questions ! :slight_smile:
1- “When we aggregate before a query or during a query, is the aggregation store somewhere? If not, why aggregate ?”
I mean by that, if i want to save as an aggregate, in my segment, the compute average i do at query time (in “post-aggregations”), can i?

2- Can I have multiple indexing tasks for the same segment? Because i don’t know how to do that …

3- Can I take a computed segment and create an other segment from that computed one ? (Like in many Cube architectures)

4- Can i create segments from multiple files? My typical use case is: I have many files for one day and i want to process a segment for each “site”, is it even possible ?

I’m so sorry for all that questions but i’m really curious and i can’t find my answers …
Thanks,
Ben

Hi Benjamin,

  1. The aggregations performed before a query would be stored as the values in the “metric” columns within each segment. I don’t believe that you can store post-aggregation results in the segments.

  2. Can you elaborate on your use case?

  3. Yes, you can use a computed segment as the input for another ingestion task, please refer to:

http://druid.io/docs/0.9.0/ingestion/update-existing-data.html

  1. If you’re using batch ingestion, please refer to the IOConfig section for how to use multiple files as input:

http://druid.io/docs/0.9.0/ingestion/batch-ingestion.html

Thanks,

Jon

Thanks Jon ! See inline.

Hi Benjamin,

  1. The aggregations performed before a query would be stored as the values in the “metric” columns within each segment. I don’t believe that you can store post-aggregation results in the segments.

Too bad it would be great.

  1. Can you elaborate on your use case?

It is more an update here. Create a segment from data and then update this segment with new data. ( so your 3rd answer is good here http://druid.io/docs/0.9.0/ingestion/update-existing-data.html)

  1. Yes, you can use a computed segment as the input for another ingestion task, please refer to:

http://druid.io/docs/0.9.0/ingestion/update-existing-data.html

I found the solution, I was thinking about IngestSegmentFirehose (http://druid.io/docs/latest/ingestion/firehose.html)

  1. If you’re using batch ingestion, please refer to the IOConfig section for how to use multiple files as input:

http://druid.io/docs/0.9.0/ingestion/batch-ingestion.html

I will try to use Hadoop Indexing Task, i’ll let you know about this !