Need help to understand aggregations result

Hi, list.

I hava a datasource where all numeric fields are aggregated (min, max, sum and count). But, sometimes I face strange results and I need help to understand.

For instance, this is a query result where I did a groupBy “myDimension” and get all aggregations for ‘myAtt’.

{
“version” : “v1”,
“timestamp” : “2016-06-03T19:00:00.000Z”,
“event” : {
“myAttCount” : 40,
“myAttLongMin” : 1,
“myAttLongMax” : 1,
“myAttLongSum” : 62,
“myDimension” : “value”
}
}

If minValue is 1, MaxValue is 1 and count is 40, the Sum should be 40, right? What is missing?

Hey Marcelo,

Could you attach the ingestion spec and query you used?

Could you attach the ingestion spec and query you used?

Sure. The query is:

{"queryType": "groupBy",
"dataSource": "myDatasource",
"granularity": "hour",
"dimensions": ["myDimension"],
"aggregations": [
        {"type": "longSum", "name": "myAttLongSum", "fieldName":
"myAttLongSum"},
        {"type": "longMin", "name": "myAttLongMin", "fieldName":
"myAttLongMin"},
        {"type": "longMax", "name": "myAttLongMax", "fieldName":
"myAttLongMax"},
        {"type": "count", "name": "myAttCount", "fieldName": "myAttCount"}
],
"intervals": ["2010-03-21T00:00/2020-01-01T00"]
}

And the ingestion spec is:

{
  "type" : "index_realtime",
  "id" : "index_realtime_telecom_2016-06-06T13:45:00.000Z_0_0",
  "resource" : {
    "availabilityGroup" : "myDatasource-45-0000",
    "requiredCapacity" : 1
  },
  "spec" : {
    "dataSchema" : {
      "dataSource" : "myDatasource",
      "parser" : {
        "type" : "map",
        "parseSpec" : {
          "format" : "json",
          "timestampSpec" : {
            "column" : "startTime",
            "format" : "auto",
            "missingValue" : null
          },
          "dimensionsSpec" : {
            "dimensions" : [ "myDimension", *// more then 44 dimensions *],
            "spatialDimensions" :
          }
        }
      },
      "metricsSpec" : [ {
        "type" : "count",
        "name" : "myAttCount"
      }, {
        "type" : "longMax",
        "name" : "myAttLongMax",
        "fieldName" : "myAtt"
      }, {
        "type" : "longMin",
        "name" : "myAttLongMin",
        "fieldName" : "myAtt"
      }, {
        "type" : "longSum",
        "name" : "myAttLongSum",
        "fieldName" : "myAtt"

* // There is more than 250 metrics and for each one we create
all aggregation types*

      } ],
      "granularitySpec" : {
        "type" : "uniform",
        "segmentGranularity" : "FIVE_MINUTE",
        "queryGranularity" : {
          "type" : "all"
        }
      }
    },
    "ioConfig" : {
      "type" : "realtime",
      "plumber" : null,
      "firehose" : {
        "type" : "clipped",
        "interval" : "2016-06-06T13:45:00.000Z/2016-06-06T13:50:00.000Z",
        "delegate" : {
          "type" : "timed",
          "shutoffTime" : "2016-06-06T13:56:00.000Z",
          "delegate" : {
            "type" : "receiver",
            "serviceName" : "druid:firehose:telecom-45-0000-0000",
            "bufferSize" : 100000
          }
        }
      }
    },
    "tuningConfig" : {
      "shardSpec" : {
        "type" : "linear",
        "partitionNum" : 0
      },
      "rejectionPolicy" : {
        "type" : "none"
      },
      "buildV9Directly" : false,
      "maxPendingPersists" : 0,
      "intermediatePersistPeriod" : "PT10M",
      "windowPeriod" : "PT1M",
      "type" : "realtime",
      "maxRowsInMemory" : 75000
    }
  }
}

Hi, everyone.

Does anyone else have this problem? Perhaps is related to roll-up process?

Marcelo, I don’t understand your question. Your query explicitly lists the aggregations you want and you are getting them in the result. What is the problem?

Marcelo, sorry I misread your question. Your ingestion spec has an aggregation to count the number of rows ingested, but our query is counting the number of druid rows after rollup.

Thanks, Fangjin.

So, is it normal to expect those results, right? There is nothing wrong with my ingestion spec or query?

It depends on what you are trying to measure.

Finding the number of ingested events is typically done with a count aggregator in the ingestion spec and a longSum aggregator in the query spec over the column in the ingestion spec. Finding the number of Druid rows and comparing that against your number of ingested events can be used to measure how effective your rollup is.

Hey Marcelo,

In your case specifically you probably want to use {“type”:“longSum”, “name”:“myAttCount”, “fieldName”:“myAttCount”} at query time rather than {“type”:“count”, “name”:“myAttCount”}. You should still use the type “count” at ingestion time.

The idea is that because Druid aggregates your data as it ingests it, counts taken at ingestion time are stored as values that you should sum at query time.

Hi, Gian and Fangjin.

Thanks a lot.