How to perform a SELECT in the results returned from a GROUP BY Druid?

I am having a hard time converting this simple SQL Query below into Druid :

SELECT name, email, street, city, country

FROM people_data

WHERE name=“Mary”

GROUP BY country, city;

So I came up with this query so far:

{

“queryType”: “groupBy”,

“dataSource”: “people_data”,

“granularity”: “day”,

“dimensions”: [“country”, “city”],

“filter”: {

"type": "in",

"dimension": "name",

"values": ["Mary"]

},

“aggregations” : [ //This is required in a groupBy so I just put something random here for the aggregation

{ "type": "longSum", "name": "total_age_per_country_city_test", "fieldName": "age" }

]

“intervals”: [ “2016-07-01T00:00:00.000/2016-07-03T00:00:00.000” ],

“queryType” : “select” {

   //Can I do something like this? Like embed in the select after the groupBy query is done? I am getting an error whenever I do this. I am not sure how to grab the data from the groupBy query and use that to select the name, email, street, city, and country.

}

}

Now what I want to do is SELECT the name, email, street, city, country from the groupBy query above. How do I grab the data from this groupBy query and select the name, email, street, city, country columns?

Hi John,

I don’t believe that type of query is supported, only the GroupBy query currently supports nested queries:

http://druid.io/docs/latest/querying/datasource.html

You can also issue SQL queries directly with https://github.com/implydata/plyql

That isn’t a legal SQL query - if you GROUP BY country and city, you can’t just SELECT the name/email/street - they’re not included in the aggregation.
What are you actually trying to achieve with your query?

Sorry I meant to write this SQL query:

SELECT country, city, Count(*)
FROM people_data
WHERE name="Mary"
GROUP BY country, city;

Actually I realized I can just do this:

{
  "queryType": "groupBy",
  "dataSource" : "people_data",

  "granularity": "all",
  "metric" : "num_of_pages",
  "dimensions": ["country", "city"],
  "filter" : {
      "type" : "and",
      "fields" : [
          {
            "type": "in",
            "dimension": "name",
            "values": ["Mary"]
          }
      ]
  },
  "aggregations": [
    { "type": "longSum", "name": "num_of_pages", "fieldName": "count" }
  ],
  "intervals": [ "2016-07-20/2016-07-21" ]
}

Since this will group by the dimension list and also select them. I thought it wouldn’t select the dimensions I pass to that dimension array in the group by query.

Also can someone explain to me the purpose of the metric key? I noticed that you can put any arbitrary value you want there even if that metric doesn’t exist in your dataset (such as num_of_pages). But in the aggregation object the “name” key value has to be the same as the metric key value (num_of_pages). Why is that? Or I am just interpreting it wrong.

You don’t need the metric for a groupby query - you’d use that for a topN query, to decide the ordering of the query results and therefore what Druid uses to work out what is ‘top’.

So the metric key value doesn’t have to be from your actual data source? It is just a placeholder name you use in your aggregation object?