MIN and MAX functions on Druid string type dimension columns

Hi,

I am using Druid as a backend to our Tableau visualization layer using Avatica JDBC driver. Live queries are generated in Tableau and sent to Druid via Avatica. Unfortunately, there are few queries that apply MAX and MIN aggregation functions on string type dimension columns while using GROUP BY. Is there a way to make this work? Can I use lookups where I can keep a numeric type as key and apply MAX or MIN on that?

Thanks!

Manu

Hi,

Checking again. Does anyone have any idea on how to it? Help will be much appreciated

Thanks!
Manu

Can you ingest the same column as a string and as a numeric type?

Hi Man,

Tableau supports pass through functions. You can use them to do specific computations.

ex:

RAWSQLAGG_STR(“DB_SPECIFIC_CONCAT(%1,%2)”, [arg1], [arg2])

For reference:

https://help.tableau.com/current/pro/desktop/en-us/functions_functions_passthrough.htm

If this does not work, please elaborate your use case.

Regards, Chari.

Thanks, Chari. I will try to explain my use case in details

Assume my table has below columns

col1 int (dimension)

col2 string (dimension)

col3 string (dimension)

col4 long (metric)

If I run a GROUP BY expression like below, I am getting a failure from Druid

SELECT col1, MAX(col2), MAX(col3), SUM(col4) GROUP BY col1

As I understand, this is because Druid does not support aggregate functions on string type

Thanks!

Manu

Hi Manu,

You are right, max/min is not supported on non-numeric.

Regards, Chari.

Hi Manu,

Have u tried casting to integer and then take max?

Thanks

Hi Tijo,

Thanks for your response. Yes, I have tried casting the string column to int and taking a max. The query is running without any errors but the max value returned is 0 for all columns.

Thanks!

Manu

Is it possible to attach ur ingestion spec, some sample data and query you used. I hope performance is not a big deal for you. I think you cant get the same performance as the value is a measure.

Sure. Attaching the ingestion spec and sample data. This is synthetic data that we are using for testing now. The query I have tried are listed below, which basically tries to run aggregation on string column “analysis_period”

Query 1: Failure while executing

select company,MAX(analysis_period) AS a_period from druid_test group by company;

``

Query 2: Query successful but gives 0 as the output of MAX

select company,MAX(CAST(analysis_period AS INTEGER)) AS a_period from druid_test group by company;

``

druid_parquet_ingest_spec (3.05 KB)

query-731cd4a4-2c97-4dea-bacc-4f4fef99e063.csv (2.81 KB)