Issues with joins on cachedNamespace lookups

When I run join queries on cachedNamespace -jdbc- lookups some queries work and others don’t.

  • Query 1 (Works Only with GroupBy)
select sr.siteid, lr.v as sitename
from atc_battery_readings sr
inner join lookup.sitename lr on lr.k = cast(sr.siteid as varchar)
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '1' HOUR
group by 1,2 

Without GroupBy --> Error: dataSource is not joinable: LookupDataSource{lookupName='sitename'}

  • Query 2 (Works Only with GroupBy)
select siteid, lookup(cast(siteid as varchar),'sitename') as sitename
from atc_battery_readings
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '1' HOUR
group by 1,2 

Without GroupBy --> Error: Unknown exception Lookup [sitename] not found

  • Query 3
select sr.siteid, lr.v as sitename
from atc_battery_readings sr
inner join lookup.jdbctest lr on lr.k = cast(sr.siteid as varchar)
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '1' HOUR
group by 1,2

Error: dataSource is not joinable: LookupDataSource{lookupName='jdbctest'}


Things I've tried
  • I made sure these 2 params are defined across the cluster
druid.lookup.lookupTier=__default
druid.lookup.lookupTierIsDatasource=false
  • Running a Cron job to populate Map lookups instead of cachedNamespace, everything works fine in that scenario.
Logs
org.apache.druid.sql.http.SqlResource - Failed to handle query: SqlQuery{query='select
druid-broker     |   sr.siteid,
druid-broker     |   lookup(cast(sr.siteid as varchar),'sitename') as sitename
druid-broker     | from sitedata sr
druid-broker     |
druid-broker     |
druid-broker     | ', resultFormat=ARRAY, header=true, context={sqlOuterLimit=100}, parameters=[]}
druid-broker     | org.apache.druid.query.QueryInterruptedException: Lookup [sitename] not found
druid-broker     |      at org.apache.druid.client.JsonParserIterator.convertException(JsonParserIterator.java:268) ~[druid-server-0.21.1.jar:0.21.1]
druid-broker     |      at org.apache.druid.client.JsonParserIterator.init(JsonParserIterator.java:183) ~[druid-server-0.21.1.jar:0.21.1]
druid-broker     |      at org.apache.druid.client.JsonParserIterator.hasNext(JsonParserIterator.java:93) ~[druid-server-0.21.1.jar:0.21.1]
druid-broker     |      at org.apache.druid.java.util.common.guava.BaseSequence.makeYielder(BaseSequence.java:89) ~[druid-core-0.21.1.jar:0.21.1]

Relates to Apache Druid <0.22.1>

Welcome @Ahmad_Eldefrawy! Are you doing streaming ingestion or batch ingestion?

Hello Mark, Thank for getting back.
Both actually.But mostly tested with real-time stream ingestion

Thanks! My initial thought is that it might have to do with this, but I might be assuming too much. In researching your question, I also came across this Introduction to JOINs in Apache Druid article. It provided some context for my initial thought.

Let me know if any of that helps.

I have not experimented with lookups yet, but I’m wondering whether the CAST is having an effect on this.
Even so, it sounds like a bug, but perhaps you can find a workaround by applying the CAST at the source rather than at query time.

You can help by documenting the bug and the workaround at Issues · apache/druid · GitHub

1 Like

Unfortunately couldn’t get it to work with casting either.

A work around was used by a cron job to download the lookup table and ingest it into Druid as a map instead of Cached namespace. Will update here if we test it with 0.22.1

The behaviour is the same in 0.22.1 in my testing

@Ahmad_Eldefrawy thanks for the testing and the update.

Quick thought that you could maybe use a lookup function at ingestion time in your transformsSpec – it will save you memory and processing at query-time, too - if the values are settled at ingestion time it’s a good approach … I just mention it as it’s a string function that not many people know exists…