Is aggregation (count) on dimension but not on metrics supported by Druid?

Hi friends,

I have a question which blocks me for days, I asked it on stackoverflow, and copy it here, can anyone help me with this? Thanks in advance.

Quick link:

http://stackoverflow.com/questions/33909985/is-aggregation-count-on-dimension-but-not-on-metrics-supported-by-druid

For example there are two dimensions: [country, website] and one metric: [PV].

I want to know the average PV of website for each country.

To make it, it’s easy to get the total PV in each country, however it’s difficult to get the count of website in each country, furthermore the expect result is the total PV(in each country) divided by the count of website(in each country).

What I can do is apply “groupBy” query by country & website as below, and then group the result by country outside in my application. It’s very very very slow, because the query extract lots of data from Druid and most of them is meaningless just for a sum.

{
    "queryType": "groupBy",
    "dataSource": "--",
    "dimensions": [
        "country",
        "website"
    ],
    "granularity": "all",
    "intervals": [
        "--"
    ],
    "aggregations": [
        {
            "type": "longSum",
            "name": "PV",
            "fieldName": "PV"
        }
    ]
}

Any one can help with this? I’m wondering it’s impossible such a common query is not supported by Druid.

Thanks in advance.

Could anyone help with this? It’s highly appreciated.

在 2015年11月25日星期三 UTC+8下午7:35:28,Junjie写道:

Hi,

You can probably do a flat group-by query for this like below…

{
“queryType”: “groupBy”,
“dataSource”: “–”,
“dimensions”: [
“country”
],
“granularity”: “all”,
“intervals”: [
“–”
],
“aggregations”: [
{
“type”: “longSum”,
“name”: “PV”,
“fieldName”: “PV”
},
{
“type”: “cardinality”,
“name”: “websites_unique_count”,
“fieldName”: “websites”
},
],
“postAggregations”: [
{
“type”: “arithmetic”,
“name”: “avg_pv”,
“fn”: “/”,
“fields”: [
{ “type”: “fieldAccess”, “fieldName”: “PV” },
{ “type”: “fieldAccess”, “fieldName”: “websites_unique_count” }
]
}
]
}

learn more about “cardinality” aggregator at http://druid.io/docs/0.8.2/querying/aggregations.html#cardinality-aggregator

– Himanshu

@Himanshu Gupta

I tried your query many times but got nothing (no result and no error), I have no authorities to checkout the log on Druid server side.

I installed the another Duird server on my local host and “wikipedia” example is running on it.

Let’s focus on “cardinality”.

Familiar query is:

{

"queryType": "groupBy",

"dataSource": "wikipedia",

"granularity": "all",

"dimensions": ["language"],

"aggregations": [

    {"type": "**cardinality**", "name": "user_distinct_count", "fieldNames": ["user"]}

],

"intervals": ["2010-01-01T00:00/2020-01-01T00"]

}

And the part of result is like:

[ {

“version” : “v1”,

“timestamp” : “2010-01-01T00:00:00.000Z”,

“event” : {

"language" : "de",

"user_distinct_count" : "AQAAawAAAAAPAQAVUAAWEAAkAQAqIAArAwAwAQA4AQA7AQA9IAA/IABVcQBgIABlBQBsIABwAwCFAQCZAgCcAQCeIACkAgCrIACtAwCuEAC0EADABADDEADUIADhQADpIADvEAD4EAEKEAETIAEUEAEYAQEpAQEtEAExAwEzEAFBEAFREAFSIAFmAgF6EAGEAgGSEAGVAQGWAwGbAwGmQAG2AwG4AQHFAwHIEAHOAQHmEAH0AQH/AQIAAgIIQQIhAQIjEQIkAQIoAQIxIgJlEAJrEAJ2EAKKEAKTAgKwAQK0AQK4MAK9EALQEALSBQLYAQLgAwL5BQMLAwMMAQMbEAM8IANOMQNdAgNvAwOWMAOoMAOqMAOsEAPEBAPJEAPKAwPeAgPjAgPkEAPlAQPoEAP0FAP6BwAAAAAAAAAAAAAAAAAAAAAAAA=="

}

},…

So, the count of the user is not a Integer value but a strange string? Could you help with this furthermore ?

Thanks a lot.

在 2015年11月26日星期四 UTC+8上午10:43:30,Himanshu Gupta写道:

Hi Himanshu,

I found that “cardinality” works fine on “timeseries” query, that means I have to get count of websites country by country via “filter” which is almost impossible.

Could you help to give more advices?

Thanks.

在 2015年11月26日星期四 UTC+8上午10:43:30,Himanshu Gupta写道:

Junjie, have you tried just using SQL? https://github.com/implydata/plyql

Just use a high level query planner for Druid, the low level native Druid language has bugs.