Druid where clause after nested query

Hi Experts ,
I am not sure of the behavior i am seeing in Druid 0.18

SELECT count(*) FROM (SELECT avg(price) as price,SupplierID FROM [Products] group by SupplierID) where price > 20

Above query returns empty result even when there are rows with price greater than 20 . Any aggregate functions like Count,SUM etc returns empty result

But below query returns the result.

SELECT * FROM (SELECT avg(price) as price,SupplierID FROM [Products] group by SupplierID) where price > 20

Anything i am missing here

Thanks in advance

Hello -

Your WHERE clause should be inside the parentheses. Does that work then?

Yes it works inside the inner clause as expected but when its outside it does not work with operators like Count, and Sum.
We can use having clause instead but i am still curious to know why the above one does not work.

Thanks

To be clear we need price>20 in the above example after the group by thats why where clause is in outer query. Using HAVING Clause after groupby would do the same functionality as the above first query but still as per SQL approach WHERE clause in outer query should work too. Hence the question .

Hm I wonder if it’s because the outer price is not in scope? Ie, the outer query is not able to address the “price” from the inner query?

Peter,
But we do select avg(price) as price in the inner query which should be within scope of outer query right?

And it does work if we use select * in the. outer query (refer second query above) . it doesnt work only if we use count or sum

For laughs, could you try “AS bananas” instead of “AS price” and see what happens?

Ha! no luck with “AS Bananas” too

Sorry I misunderstood initially. I think Peter may be right. Maybe once the count(*) is calculated, it just looks at that result for price>20, but there’s no price to look at there. Ie, the WHERE is applied to the resultset, not the inner query, since that’s where it’s placed. In your second query, there is a price value to filter with.

Ben ,
But where clause precedes select in the order of operators so shouldnt the count() applied after where clause . If count() is applied first then even Select * should have resulted with empty result but we do get result if we use select * instead of select count(*)

Also not just Select * it works even if other columns are selected in inner query and is used like Select columnotherthanprice is used . Only select count(*) or sum() does not work.

To be clear something like this works

SELECT anotherCol FROM (SELECT avg(price) as price, avg(anotherCol) as anotherCol, SupplierID FROM [Products] group by SupplierID) where price > 20

I thought more about this, and see the logic of why the results should be as you expected. I tested with 0.18, and it seemed to work OK. Eg, with my table,

select count(*) from (select sum(m1_int1_min) as summ, m2_int3_max from dataTest group by 2)

returned 99, but

select count(*) from (select sum(m1_int1_min) as summ, m2_int3_max from dataTest group by 2) where summ > 3

returned 48.

Are these not the same kind of logic you’re talking about?

I ran those with 0.18.0-iap15 , by the way.

Thanks Ben . My bad so we had simplified the query here thinking CASE clause should not matter but we did some more trials and turned out since we were using CASE in inner query its not working
Something like this does not work. only when case clause is used for the column which is used in outer where clause it does not work and that too only with aggregators like count() or sum() in outer query
SELECT count(*) FROM (SELECT CASE when sum(price)>0 then avg(price) else -1 end as price , SupplierID FROM [Products] group by SupplierID) where price > 20

Something like this works (same query but with Select * instead of select count(*))
SELECT * FROM (SELECT CASE when sum(price) >0 then avg(price) else -1 end as price ,SupplierID FROM [Products] group by SupplierID) where price > 20

I see what you mean. I reproduced on wikipedia sample data. Eg

select (summ_added) from (select case when sum(sum_added) > 10000 then sum(sum_added) else -1 end as summ_added, channel from wikipedia group by 2) where summ_added > -1

returns 41 rows, but

select count(summ_added) from (select case when sum(sum_added) > 10000 then sum(sum_added) else -1 end as summ_added, channel from wikipedia group by 2) where summ_added > -1

returns no data. Same for avg, sum.

This seems like a bug to me, in how the query is being translated. I haven’t tested with the latest version yet, but I filed a bug report for 0.18: https://github.com/apache/druid/issues/10557

Thanks for the help Ben.