How to use having clause in groupby query using pydruid

Hello!

I basically am trying to do group by query using pydruid.

{
“queryType”: “groupBy”,
“dataSource”: “datasource_name”,
“dimensions”: ,
“aggregations”: [
{
“fieldName”: “userId”,
“fieldNames”: [
“userId”
],
“type”: “cardinality”,
“name”: “COUNT_DISTINCT(userId)”
}
],
“granularity”: “all”,
“postAggregations”: ,
“intervals”: “2020-07-15T00:00:00+00:00/2020-07-16T00:00:00+00:00”,
“having”: {
“type”: “not”,
“havingSpec”: {
“type”: “lessThan”,
“aggregation”: “COUNT(date)”,
“value”: “3”
}
}
}

The above is the raw query for which i would like to convert into pydruid way of doing, specifically using group by option. The below code piece shows what i tried.

from pydruid.client import *
from pydruid.utils.aggregators import cardinality, count
from pydruid.utils.filters import Dimension
from pydruid.utils.having import *

query = PyDruid(‘http://xx.x.x.xx:8082/’, ‘druid/v2’)

group = query.groupby(
datasource=‘datasource_name’,
granularity=‘all’,
intervals=‘2020-07-08T00:00:00.00/2020-07-09T00:00:00.00’,
dimensions=,
having={“type”: “greaterThan”, “aggregation”: cardinality(“date”), “value”: 2},
aggregations={“count_distinct_uid”: cardinality(“userId”)}
)

The problem here is, i actually am not able to understand how to use having clause and searched for more details on it but couldn’t find it and also a new bee to druid.

Answer and a bit of explanation or an alternate solution using pydruid would be much appreciated.

Thanks in Prior.

Are you getting any kind of error or does the query not return the same results?

you can also use the avatica jdbc driver. Combined with https://pypi.org/project/JayDeBeApi/ from python or sparksql if you are using pyspark. WIth jdbc you can just use the sql query directly.

vijay

I get an error as shown below -

Traceback (most recent call last):

File “newTest.py”, line 14, in

aggregations={“count_distinct_uid”: cardinality(“userId”)}

File “/home/sluser/venvs/airflow_venv/lib/python3.6/site-packages/pydruid/client.py”, line 305, in groupby

query = self.query_builder.groupby(kwargs)

File “/home/sluser/venvs/airflow_venv/lib/python3.6/site-packages/pydruid/query.py”, line 394, in groupby

return self.build_query(query_type, args)

File “/home/sluser/venvs/airflow_venv/lib/python3.6/site-packages/pydruid/query.py”, line 309, in build_query

query_dict[key] = Having.build_having(val)

File “/home/sluser/venvs/airflow_venv/lib/python3.6/site-packages/pydruid/utils/having.py”, line 76, in build_having

return having_obj.having[“having”]

AttributeError: ‘dict’ object has no attribute ‘having’

Does the query work in the pydruid console? It might be better to ask in the github repo… https://github.com/druid-io/pydruid/issues