I am going to use the Imply with the datasketches data I generated with external source. Which column type I should use for it? From default types I can use only
How can I use datasketches functions (e.g.
THETA_SKETCH_ESTIMATE) with uploaded
STRING column? When I try
THETA_SKETCH_ESTIMATE function with
SQL query is unsupported
Take a look at the
isInputThetaSketch setting in the
druid-datasketches extension docs: DataSketches Theta Sketch module · Apache Druid
I think that is what you are looking for.
Which distribution are you using?
Thanks for the answers. @Mark_Herrera, I am using the Imply Polaris product. It looks like is it impossible to use external generated datasketches.
I also tried with
But still no success.
After some research, it seems like that is not currently available on Polaris.
Can you tell us a bit about your use case? Where is the data sketch being created? Could you run that calculation as part of the Polaris ingestion?
We use the datasketches library for counting unique users and then calculate retention metrics and other. So in our case we think of using Imply Polaris for the aggregations. With different aggregation fields we can get unique users.
I am afraid we can’t load all the uniq users ids to the Imply Polaris. There are too much records.
I didn’t mean to load the detailed data but instead thinking of the possibility of doing the aggregation as part of the ingestion. In other words moving the upstream aggregation that is currently creating the sketch into the druid ingestion, so that you end up with the same result.
Here’s a couple of docs that might help to achieve this:
@romanm we can help with sketches in poalris using the backend in the meantime. Can you let me know the polaris application you are using?
Hi @romanm I’ll be happy to take a look, what is the name of your Polaris environment?