Hey everyone,
I’m having a problem using the tuple sketch aggregation with rollup. I reduced it to this minimal case:
One input stream with one field “session” contianing a high cardinality ID and a metric field “value”. Doing this query we can aggregate those in a tuple sketch and use it:
{
"queryType": "timeseries",
"dataSource": {
"type": "table",
"name": "test"
},
"intervals": [
"2000-01-01/2022-01-01"
],
"aggregations": [
{
"type": "arrayOfDoublesSketch",
"name": "session_values",
"fieldName": "session",
"numberOfValues": 1,
"metricColumns": ["value"]
}
],
"postAggregations": [
{
"type": "arrayOfDoublesSketchToString",
"name": "details",
"field": {
"type": "fieldAccess",
"fieldName": "session_values"
}
}
],
"granularity": {
"type": "all"
}
}
But since the input is excedingly large this is quite a slow query. Moving the aggregation to a metric field and enabling rollup does the aggregation upon ingestion, I can see the raw data from rolled up tuple sketch using this scan query:
{
"queryType": "scan",
"dataSource": {
"type": "table",
"name": "test"
},
"intervals": [
"2000-01-01/2022-01-01"
],
"granularity": {
"type": "all"
}
}
But then I can’t use the pre-aggregated tuple sketch. I tried this query:
{
"queryType": "timeseries",
"dataSource": {
"type": "table",
"name": "test"
},
"intervals": [
"2000-01-01/2022-01-01"
],
"aggregations": [],
"postAggregations": [
{
"type": "arrayOfDoublesSketchToString",
"name": "details",
"field": {
"type": "fieldAccess",
"fieldName": "tuple_sessions" // Name of the tuple sketch metric
}
}
],
"granularity": {
"type": "all"
}
}
It gives this error:
Missing fields [[tuple_sessions]] for postAggregator [details] at [Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 1, column: 332]
I also tried using the exact same query as before in case Druid was able to optimize that on its own, but then the session
field doesn’t exist anymore.
How can we use tuple sketch with rollup?