SQL Order By Unknown exception

Hello to all,

I am a new user of Druid SQL, but generally quite familiar with SQL queries in several databases.

I am executing the following CTE with several transformations. However, I can not perform an order at the final query of my CTE.

You can see the following mock example:

WITH xx_orders AS (
  SELECT
    xx_dim
    , SUM(CASE WHEN field=some_value THEN 1 ELSE 0 END) AS xx_num_orders
    , "USER" AS user_xx
  FROM orders
  GROUP BY xx_dim
),

total_xx_orders AS (
  SELECT
    SUM(xx_num_orders) AS xx_num_orders_gen
    , "USER" AS user_xx
  FROM xx_orders
)

SELECT
  xxo.xx_dim
  , xxo.xx_num_orders
  , CASE WHEN xxo.xx_num_orders != 0 THEN (CAST(xxo.xx_num_orders AS DECIMAL) / CAST(txxo.xx_num_orders_gen AS DECIMAL)) * 100 ELSE 0 END AS xx_user_orders_percentage
FROM xx_orders AS xxo
LEFT JOIN total_xx_orders AS txxo
  ON xxo.user_xx = txxo.user_xx
ORDER BY rco.xx_num_orders

I am getting the following error:

Error: Unknown exception

Cannot build plan for query: WITH xx_orders AS...

I am doing this join, as I can not use any Window Function in Druid SQL.

Please, do you have any idea why I can not use the final order by?

Thanks in advance,

George

Hey George :slight_smile:

Not sure specifically what the issue is here, but note that as your JOIN is between two table datasources, the Druid Sql interpreter is going translate this into sub-queries in order to get this to work. That could be the source of the issue as Druid is trying to do something behind the scenes with your data that isn’t what you expect.

Re: join anyway, it’s worth reading this article that explains how JOIN / SUBQUERY works in Druid at the moment, and what you need to be aware of re: execution.

To that point, noting that Druid always likes to address just one table, are you able to flatten this before it hits Druid? Ie, so you don’t need a JOIN at all?

Also, have you just removed your __time clause for brevity? (I would expect to see that in a Druid-type query.)

For optimal execution, you may also want to just use a transform function at ingestion time to save having to calculate that CASE statement so it’s only evaluated once instead of at every query time.