Javascript Custom Aggregation and other Questions

Hi,

In your documentation I see this:

{ "type": "javascript",
  "name": "<output_name>",
  "fieldNames"  : [ <column1>, <column2>, ... ],
  "fnAggregate" : "function(current, column1, column2, ...) {
                     <updates partial aggregate (current) based on the current row values>
                     return <updated partial aggregate>
                   }",
  "fnCombine"   : "function(partialA, partialB) { return <combined partial results>; }",
  "fnReset"     : "function()                   { return <initial value>; }"

However, it is very unclear what these properties represent. Can you please explain what fieldNames, fnAggregate, fnCombine, and fnReset are with a simple example.

My use case is that I have an input row with metrics A and B.

I would like to define a metric C where C = A*B so that after ingestion I can use C in aggregations. What’s the best way to accomplish this…

Also, I have a few questions on querying in general:

Is it possible to filter based on metrics or can you only filter based on dimensions? For instance if I had a metric Amount and I wanted to filter out the rows where Amount < 500, what would be the best way to approach this? Would I have to make Amount a dimension and then write a JS filter to parse the string to an integer and check if that integer is less than 500?

Another question - how could I groupby days of the week. If I wanted metrics for Sunday, Monday, …? Is this possible?

Finally, is there a way to query Druid for datasource metadata. In other words, can you get from Druid all of the ingested datasources as well as their specs (dimensions and metrics)?

Sorry that these questions are all over the place.

Hi Austin,

So far I didn’t use the javascript aggregation, but the case you are describing should be solvable with a arithmetic postaggregatio.

postAggregation : {
  "type"  : "arithmetic",
  "name"  : "metricC",
  "fn"    : "*",
  "fields": [ {"type": "fieldAccess","name": "metricA","fieldName": "metricA"},
                {"type": "fieldAccess","name": "metricB","fieldName": "metricB"}
            ]
}

``

If you want to get only the values above a treshhold than use “haveing” (see here) in one groupBy query.

{
    "type": "lessThan",
    "aggregation": "Amount",
    "value": 500
}

``

If you add to each entry a dimension which holds the weekday, than you use a filter to get just all the data for Monday i.e.

"filter": { "type": "selector", "dimension": "weekday", "value": "Monday"}

``

There are three query types which gives you metadata:

  • Data Source Metadata Queries: see doc
  • Segment Metadata Queries: see doc
  • Time Boundary: see doc
    I hope that is helpful for you.

With best regards

Martin Schneider

Awesome response Martin!

Just wanted to add one more thing, on the brokers there are endpoints
that can also give you information about the various data sources,
they are under the "GET" portion of the Broker docs:

http://druid.io/docs/0.7.3/Broker.html

There are limitations for what they can show you on top of real-time
segments, but once you have some historical segments, they can show
you information about what's in them.

--Eric

I appreciate your response!

Unfortunately, I don’t think your post-aggregation solves the problem. If I have the following data:

Price/unit Units Sold

Record 1: 2 3

Record 2: 1 3

Record 3: 3 1

And I want to calculate total revenue, it would be done as follows:

total revenue = 23 + 13 + 3*1 = 12

What you’re post-aggregation would do I’m afraid is compute (2 + 1 + 3) * (3 + 3 + 1) = 42

Maybe this is a problem that Druid cannot solve and I need to do the computation of

record revenue pre-ingestion (add a column).

I would still like someone to explain to me the different fields in the custom javascript aggregator!

Keep in mind that Druid is focused on timeseries. So what you mentioned one solution on your own, add a new metric for each entry, which holds the revenue = units * ‘price/unit’ and still keep the other dimensions.

If you look here and goto the JavaScript post-aggregator you find the general way of adding a JavaScript post-aggregator

postAggregation : {
  "type": "javascript",
  "name": <output_name>,
  "fieldNames" : [<aggregator_name>, <aggregator_name>, ...],
  "function": <javascript function>
}
  • output_name is the way the aggegrated field will be displayed, holds the result of the javascript function
  • aggregator_name would be ,in your case, ‘price/unit’ and ‘units’ which act as input for the javascript function
  • javascript function is simply the function you want to have in javascript

a littel below you can find an example

{
  "type": "javascript",
  "name": "absPercent",
  "fieldNames": ["delta", "total"],
  "function": "function(delta, total) { return 100 * Math.abs(delta) / total; }"
}

I just realized that you talked about the JavaScript Aggregations. I’m sorry for that confusion on my side.

So like it stated in the first post, I don’t have used the JavaScript Aggregations/Post-Aggregations so far. If you had look on the Aggregations doc a little further down you see this example JavaScript Aggregator.

{
  "type": "javascript",
  "name": "sum(log(x)*y) + 10",
  "fieldNames": ["x", "y"],
  "fnAggregate" : "function(current, a, b)      { return current + (Math.log(a) * b); }",
  "fnCombine"   : "function(partialA, partialB) { return partialA + partialB; }",
  "fnReset"     : "function()                   { return 10; }"
}
  • name is output dimension name
  • fieldnames is the dimension list of function inputs
  • fnAggregate : is the function how you aggregate values in on row (if I understand it correct)
  • fnCombine: is the function how to aggregate the different rows
  • fnReset: initial value for the aggregation
    Let me try to create a JavaScript Aggregator for your example:
{
  "type": "javascript",
  "name": "revenue_Sum('price/unit' * units)",
  "fieldNames": ["price/unit", "units"],
  "fnAggregate" : "function(current, a, b)      { return current + ('price/unit' * 'units'); }",
  "fnCombine"   : "function(partialA, partialB) { return partialA + partialB; }",
  "fnReset"     : "function()                   { return 0; }"
}

I hope I finally could help you a little bit.

With best regards

Martin Schneider