SQL group by with having

Hi,

I am executing the following SQL, which is working fine.

select Category, CountryName, Store_Name,

APPROX_COUNT_DISTINCT(prod_hyper) as apx_hyper_prod, APPROX_COUNT_DISTINCT_DS_HLL( Product_id,21,‘HLL_8’ ) as apx_theta_prod

from retail_data_rollup_index_hyper where

TIME_EXTRACT("__time",‘day’)>0

group by Category, CountryName,Store_Name

but, if i add having clause to the above SQL, it is running out of memory.

select Category, CountryName, Store_Name,

APPROX_COUNT_DISTINCT(prod_hyper) as apx_hyper_prod, APPROX_COUNT_DISTINCT_DS_HLL( Product_id,21,‘HLL_8’ ) as apx_theta_prod

from retail_data_rollup_index_hyper where

TIME_EXTRACT("__time",‘day’)>0

group by Category, CountryName,Store_Name having apx_hyper_prod != apx_theta_prod

I think the approx count computation is getting evaluated 2 times - in Select and having clause and thus causing out of memory. I dont know how to validate and confirm. Just thought of sharing with the group.

Regards, Chari.

Hi Chari,

Explain Plan could give some information to confirm the same.

https://druid.apache.org/docs/latest/querying/sql

Thanks & Rgds

Venkat

Hi Venkat,

When i execute the following query, iam getting the below mentioned exception. I read through the documentation, i think the syntax is correct. Can you please suggest what is going wrong.

EXPLAIN PLAN FOR select ReportingCategory, Store_CountryName from retail_final_hadoop_json

Unknown exception / Encountered “( EXPLAIN” at line 1, column 15. Was expecting one of: … <QUOTED_IDENTIFIER> … <BACK_QUOTED_IDENTIFIER> … <BRACKET_QUOTED_IDENTIFIER> … <UNICODE_QUOTED_IDENTIFIER> … “LATERAL” … “(” … “(” “WITH” … “(” “+” … “(” “-” … “(” “NOT” … “(” “EXISTS” … “(” <UNSIGNED_INTEGER_LITERAL> … “(” <DECIMAL_NUMERIC_LITERAL> … “(” <APPROX_NUMERIC_LITERAL> … “(” <BINARY_STRING_LITERAL> … “(” <PREFIXED_STRING_LITERAL> … “(” <QUOTED_STRING> … “(” <UNICODE_STRING_LITERAL> … “(” “TRUE” … “(” “FALSE” … “(” “UNKNOWN” … “(” “NULL” … “(” <LBRACE_D> … “(” <LBRACE_T> … “(” <LBRACE_TS> … “(” “DATE” … “(” “TIME” … “(” “TIMESTAMP” … “(” “INTERVAL” … “(” “?” … “(” “CAST” … “(” “EXTRACT” … “(” “POSITION” … “(” “CONVERT” … “(” “TRANSLATE” … “(” “OVERLAY” … “(” “FLOOR” … “(” “CEIL” … “(” “CEILING” … “(” “SUBSTRING” … “(” “TRIM” … “(” “CLASSIFIER” … “(” “MATCH_NUMBER” … “(” “RUNNING” … “(” “PREV” … “(” “NEXT” … “(” <LBRACE_FN> … “(” “MULTISET” … “(” “ARRAY” … “(” “PERIOD” … “(” “SPECIFIC” … “(” … “(” <QUOTED_IDENTIFIER> … “(” <BACK_QUOTED_IDENTIFIER> … “(” <BRACKET_QUOTED_IDENTIFIER> … “(” <UNICODE_QUOTED_IDENTIFIER> … “(” “ABS” … “(” “AVG” … “(” “CARDINALITY” … “(” “CHAR_LENGTH” … “(” “CHARACTER_LENGTH” … “(” “COALESCE” … “(” “COLLECT” … “(” “COVAR_POP” … “(” “COVAR_SAMP” … “(” “CUME_DIST” … “(” “COUNT” … “(” “CURRENT_DATE” … “(” “CURRENT_TIME” … “(” “CURRENT_TIMESTAMP” … “(” “DENSE_RANK” … “(” “ELEMENT” … “(” “EXP” … “(” “FIRST_VALUE” … “(” “FUSION” … “(” “GROUPING” … “(” “HOUR” … “(” “LAG” … “(” “LEAD” … “(” “LAST_VALUE” … “(” “LN” … “(” “LOCALTIME” … “(” “LOCALTIMESTAMP” … “(” “LOWER” … “(” “MAX” … “(” “MIN” … “(” “MINUTE” … “(” “MOD” … “(” “MONTH” … “(” “NTH_VALUE” … “(” “NTILE” … “(” “NULLIF” … “(” “OCTET_LENGTH” … “(” “PERCENT_RANK” … “(” “POWER” … “(” “RANK” … “(” “REGR_SXX” … “(” “REGR_SYY” … “(” “ROW_NUMBER” … “(” “SECOND” … “(” “SQRT” … “(” “STDDEV_POP” … “(” “STDDEV_SAMP” … “(” “SUM” … “(” “UPPER” … “(” “TRUNCATE” … “(” “USER” … “(” “VAR_POP” … “(” “VAR_SAMP” … “(” “YEAR” … “(” “CURRENT_CATALOG” … “(” “CURRENT_DEFAULT_TRANSFORM_GROUP” … “(” “CURRENT_PATH” … “(” “CURRENT_ROLE” … “(” “CURRENT_SCHEMA” … “(” “CURRENT_USER” … “(” “SESSION_USER” … “(” “SYSTEM_USER” … “(” “NEW” … “(” “CASE” … “(” “CURRENT” … “(” “CURSOR” … “(” “ROW” … “(” “(” … “(” “SELECT” … “(” “VALUES” … “(” “TABLE” … “UNNEST” … “TABLE” … / org.apache.calcite.sql.parser.SqlParseException

Regards, Chari.

Hi Chari,

Are you using dsql to run this query? Does this work for you? ->

select ReportingCategory, Store_CountryName from retail_final_hadoop_json

I think you are missing double quotes for the datasource name.

Can you please try running the following:

EXPLAIN PLAN FOR select ReportingCategory, Store_CountryName from “retail_final_hadoop_json”

Thanks & Regds

Venkat

Hi Venkat,

A simple SQL statement like “EXPLAIN PLAN FOR SELECT 1 + 1;” from the unified web console is resulting in a org.apache.calcite.sql.parser.SqlParseException.

While issuing a curl for the same query works and outputs a plan.

curl -X POST
-H “Content-Type:application/json”
-d
‘{
“query”:“EXPLAIN PLAN FOR SELECT 1+1”
}’
http://localhost:8888/druid/v2/sql

[{“PLAN”:“BindableValues(tuples=[[{ 2 }]])\n”}]

=> Inspecting the request payload from the Network activity console, the query issued is

“SELECT * FROM (EXPLAIN PLAN FOR SELECT 1 + 1) LIMIT 5000”

Looks like the console is modifying the query with the prefix “SELECT * FROM” in this case which is resulting in the issue.

curl of the modified query “SELECT * FROM (EXPLAIN PLAN FOR SELECT 1 + 1) LIMIT 5000” fails.

Thanks,

Sashi

Raised https://github.com/apache/incubator-druid/issues/8254.

Hi Venkat,

I was trying in the “Query” tab in the unified console. Anyways, there is an option “Explain Plan” in the query window to look at the query plan.

Thanks.

Regards, Chari.