Sql Support for Set Operations on sketches

All,
We have a use case where we need to do set operations on sketches stored in druid for a number of dimensions. Since this is not supported out of the box, we use a proxy component to read these out from druid , submit to a spark cluster for do the set operations

[ Front End Tile ] ----> [ Proxy Component ] <—> Fetch Sketches from Druid

Submit a spark job with sketches to compute

Set Operations

Off late we are looking to see if we can speed up this pipeline . One alternative we are exploring is

  1. Try to see if we can achieve the same in druid .

a. Enhance calcite grammar to add sketches specific grammar , add glue logic to invoke druid internals

We have the following questions for you

  1. Since this sql interface for sketches is a generic ask , is this a item on druid road map

  2. If this is not done for a specific reason , could we please know that reason

-thanks

This patch https://github.com/apache/incubator-druid/pull/8487 was just merged into master, which adds Druid SQL support for the HLL, Theta, and quantiles sketches (set operations are supported). The tuples sketch and some of the newer quantile sketch postaggs aren’t supported yet, they will be added in a later patch.

Does that look useful for your case?

Thanks,

Jon

Hi John
Let me look at the patch and see if it helps us. Thanks for the prompt response

-venkat

I think the changes are in line with what we need. Looks like https://github.com/apache/incubator-druid/releases/tag/druid-0.16.0-incubating

does not have this changes. Can we have a eta as to when the next release candidate will be out. In the meantime, we will try to do a build from the source code

-venkat

Hi Venkat,
Druid 0.17 is slated for either Dec or Jan.

Cheers,

Jad.