how to count the result of "groupBy"?

i get the result of groupby, but how to count the result of groupby?
my json is:

{

    "queryType":"timeseries",

    "dataSource": {

            "type":"query",

            "query":{

              "queryType": "groupBy",

              "dataSource": "sitestat",

              "granularity": "all",

              "dimensions": ["userid"],

              "filter": {

                    "type": "and",

                    "fields": [

                            { "type": "selector", "dimension": "tid", "value": "4008006676" },

                            { "type": "selector", "dimension": "host", "value": "dota2.vpgame.com" }

                    ]

              },

              "aggregations": [

                    { "type": "longSum", "name": "pv", "fieldName": "pv" }

              ],

              "having": {

                    "type": "and",

                    "havingSpecs": [

                            { "type":"greaterThan","aggregation":"pv","value":5}

                    ]

              },

              "intervals": [ "2016-05-27/2016-05-28" ]

            }

    },

    "granularity": "all",

    "aggregations": [

                    { "type": "count", "name": "rows" }

    ],

    "intervals": [ "2016-05-27/2016-05-28" ]

}

I want to get select count(*) from ( select userid, sum(pv) as pv from sitestat where tid=‘4008006676’ and host=‘dota2.vpgame.com’ group by userid having sum(pv)>5 ) as mytable

but the rows is the result of “dataSource”: “sitestat”: select count(*) from sitestat . WHY?

Nested queries (dataSource type “query”) are currently only supported when both the outer and inner query are groupBys. Could you try rewriting your outer timeseries as a groupBy and see if that works?

thanks very much. I try like this

{

    "queryType":"groupBy",

    "dataSource": {

            "type":"query",

            "query":{

              "queryType": "groupBy",

              "dataSource": "sitestat",

              "granularity": "all",

              "dimensions": ["userid"],

              "filter": {

                    "type": "and",

                    "fields": [

                            { "type": "selector", "dimension": "tid", "value": "800056379" }

                    ]

              },

              "aggregations": [

                    { "type": "longSum", "name": "pv", "fieldName": "pv" },

                    { "type": "longSum", "name": "vv", "fieldName": "vv" },

                    { "type": "longSum", "name": "cost_time", "fieldName": "cost_time" }

              ],

              "having": {

                    "type": "and",

                    "havingSpecs": [

                            { "type":"greaterThan","aggregation":"pv","value":1},

                            { "type":"greaterThan","aggregation":"vv","value":1},

                            { "type":"greaterThan","aggregation":"cost_time","value":10}

                    ]

              },

              "intervals": [ "2016-05-26/2016-05-27" ]

            }

    },

    "granularity": "all",

    "dimensions": null,

    "aggregations": [

            { "type": "count", "name": "rows" }

    ],

    "intervals": [ "2016-05-26/2016-05-27" ]

}

I can get the count of the inner groupBy, but it cost too much time maybe 3s-4s, Is there other mathod more efficiently?

在 2016年5月27日星期五 UTC+8下午3:56:08,xuzhe写道:

How long does the inner query, by itself, take? I’m wondering how much of the 3–4s query time is due to the inner query vs the outer query.

thanks for Gian‘s answer.

The inner query take 3s itself, we get some massage from the log:

query/cpu/time: 800ms

query/time: 2000ms

there are 100-200 million events in the datasource.

在 2016年6月1日星期三 UTC+8上午6:39:13,Gian Merlino写道:

Okay, good to know. What sort of performance are you hoping for?

If you’re brave, you can try this patch and see if it helps: https://github.com/druid-io/druid/pull/2998. It adds a new groupBy strategy that you can enable by passing “groupByStrategy”: “epinephelinae” in your query context. You will also have to set druid.processing.numMergeBuffers=X in your config, where X is something non-zero (2 should work fine for testing).

Thanks for Gian‘s answer very much.

I have read the https://github.com/druid-io/druid/pull/2998, it maybe useful. I want to try the patch ,but my current druid version is 0.9.0, how to use the patch? No doc describe it.

在 2016年6月1日星期三 UTC+8上午11:33:07,Gian Merlino写道:

If you’re comfortable building Druid from patched sources, you can download the patch (https://github.com/druid-io/druid/pull/2998.patch) and apply it yourself, or you can checkout the branch the PR was sent from, and build Druid by running “mvn package”. If you would prefer to build from master or use a released version, then you can wait for the PR to get merged and appear in a release.