Not able to create aggregation query on top of Union Query

Not able to create aggregation query on top of Union Query

Here is the query

select “metric_name”,avg(“AVG1”)
FROM
(select
“metric_name”,
AVG (“metric_value”) as “AVG1”
from “Table1”
where “metric_name”= ‘metric1’
group by “site_name”,
“metric_name”
UNION ALL
select
“metric_name”,
AVG (“metric_value”) as “AVG1”
from “Table1”
where “metric_name”= ‘metric1’
group by “site_name”,
“metric_name”
)
GROUP by “metric_name”
LIMIT 100

Error I am getting is:

Possible error: SQL requires union between inputs that are not simple table scans and involve a filter or aliasing. Or column types of tables being unioned are not of same type.

Please let me know if such a query is not possible in SQL. Is this possible in JSON query and if not how are such use cases served in Druid?

There are currently limitations on what can be in a UNION for subqueries. Essentially you cannot do aggregation on the subquery if you are using UNION.
What are you trying to achieve with this query?
In this particular case, it seems you could just do this in a single query replacing the UNION approach with an IN clause in the internal query, here’s an example with the wikipedia data:

SELECT "page", AVG("AVG1")
FROM
(
SELECT "page",
   AVG ("added") as "AVG1"
FROM "wikipedia"
 WHERE "page" IN ('Amsterdam','Ayrton Senna')
GROUP BY "channel", "page"
)x
GROUP BY "page"
ORDER BY 2 DESC

Additionally, this discussion sheds some light as to why this is the case for now : Select from union subquery fails due to column order · Issue #10221 · apache/druid · GitHub

Thanks for your respose.
In my use case the two queries that I am trying to query different time windows.
I need this kind of query to get two data points for two time windows ,so I can show them in superset widget

Okay, I think this can be achieved with a similar strategy with a single pass at the table, while using aggregation expression filters to calculate the metrics for each time period :

SELECT "countryName", 
       SAFE_DIVIDE(SUM("sum_period_1"),SUM("cnt_period_1")) as "avg_period_1",
       SAFE_DIVIDE(SUM("sum_period_2"),SUM("cnt_period_2")) as "avg_period_2",
       SUM("cnt_period_2"+"cnt_period_1") as "total_activity"
FROM
(
SELECT "countryName",
   SUM ("added") FILTER (WHERE __time BETWEEN '2016-06-27T07:00:00' AND '2016-06-27T08:00:00') as "sum_period_1",
   COUNT(1) FILTER (WHERE __time BETWEEN '2016-06-27T07:00:00' AND '2016-06-27T08:00:00') as "cnt_period_1",
   SUM ("added") FILTER (WHERE __time BETWEEN '2016-06-27T08:00:00' AND '2016-06-27T09:00:00') as "sum_period_2",
   COUNT(1) FILTER (WHERE __time BETWEEN '2016-06-27T08:00:00' AND '2016-06-27T09:00:00') as "cnt_period_2"
FROM "wikipedia"
 WHERE 
 (
        __time BETWEEN '2016-06-27T07:00:00' AND '2016-06-27T08:00:00' -- from 7am to 8am
    OR
        __time BETWEEN '2016-06-27T08:00:00' AND '2016-06-27T09:00:00' -- from 8am to 9am
 )
 
GROUP BY "channel", "countryName"
)x
GROUP BY "countryName"
ORDER BY 4 DESC

Different filters on each expression plays a very similar role to the UNION approach you were attempting, the advantage here is that there is single pass at the table.

Thanks Sergio, this was immensely helpful.