Can we do nested set operation with thetaSketch?

hi guys,
thetaSketch is useful and now I am exploring the limitation of it.

can we do complex set operation such as:

(A UNION B) INTERSECT (C UNION D)

In the doc, I saw

{
      "type": "thetaSketchEstimate",
      "name": "final_unique_users",
      "field":
      {
        "type": "thetaSketchSetOp",
        "name": "final_unique_users_sketch",
        "func": "INTERSECT",
        "fields": [
          {
            "type": "fieldAccess",
            "fieldName": "A_unique_users"
          },
          {
            "type": "fieldAccess",
            "fieldName": "B_unique_users"
          }
        ]
      }
    }

this can do (A INTERSECT B), but can we do nested set operation?

Thanks for your response!

I think you can do it by having nested groupBy.

Hi,

You should be able to write something like below…

{
  "type": "thetaSketchEstimate",
  "name": "your_metric",
  "field": {
    "type": "thetaSketchSetOp",
    "name": "A_U_B__I__C_U_D",
    "fields": [
      {
        "type": "thetaSketchSetOp",
        "name": "A_U_B",
        "func": "UNION",
        "fields": [
          {
            "type": "fieldAccess",
            "fieldName": "A"
          },
          {
            "type": "fieldAccess",
            "fieldName": "B"
          }
        ]
      },
      {
        "type": "thetaSketchSetOp",
        "name": "C_U_D",
        "func": "UNION",
        "fields": [
          {
            "type": "fieldAccess",
            "fieldName": "C"
          },
          {
            "type": "fieldAccess",
            "fieldName": "D"
          }
        ]
      }
    ]
  }
}

I missed the “INTERSECT” above, following should work…

{
  "type": "thetaSketchEstimate",
  "name": "your_metric",
  "field": {
    "type": "thetaSketchSetOp",
    "name": "A_U_B__I__C_U_D",

    "func"" "INTERSECT",

    "fields": [
      {
        "type": "thetaSketchSetOp",
        "name": "A_U_B",
        "func": "UNION",
        "fields": [
          {
            "type": "fieldAccess",
            "fieldName": "A"
          },
          {
            "type": "fieldAccess",
            "fieldName": "B"
          }
        ]
      },
      {
        "type": "thetaSketchSetOp",
        "name": "C_U_D",
        "func": "UNION",
        "fields": [
          {
            "type": "fieldAccess",
            "fieldName": "C"
          },
          {
            "type": "fieldAccess",
            "fieldName": "D"
          }
        ]
      }
    ]
  }
}

brilliant~

current doc is:

{
  "type"  : "thetaSketchSetOp",
  "name": <output name>,
  "func": <UNION|INTERSECT|NOT>,
  "fields"  : <the name field value of the thetaSketch aggregators>,
  "size": <16384 by default, must be max of size from sketches in fields input>
}

would you update the doc to include nested query grammer?

在 2016年2月19日星期五 UTC+8下午10:47:14,Himanshu Gupta写道:

Updated: https://github.com/druid-io/druid/pull/2514/files

saw that, maybe we should give more more nested op examples to demonstrate its usage.

在 2016年2月23日星期二 UTC+8上午1:57:41,Fangjin Yang写道: