Unique Counts

Hello Druid Gurus,

I have been trying to get unique counts but was not successful. I tried both cardinality and hyperUniqueCardinality. What could be the issue? please help.

  • First i tried with ‘cardinality’ but i get error “null exception”. Below is my query and the error.

{

“queryType”: “groupBy”,

“dataSource”: “PCIRcvrPrnt”,

“granularity”: “all”,

“dimensions”: [

“PRNT_MERCH_NAME”

],

“filter”: {

“type”: “and”,

“fields”: [

{

“type”: “selector”,

“dimension”: “PRNT_MERCH_KEY”,

“value”: 3691

}

]

},

“aggregations”: [

{

“type”: “doubleSum”,

“fieldName”: “rcvr_12m_ntpv”,

“name”: “rcvr_12m_ntpv”

},

{

“type”: “doubleSum”,

“fieldName”: “rcvr_3m_ntpv”,

“name”: “rcvr_3m_ntpv”

},

{

“type”: “cardinality”,

“fieldName”: [“INDY_NAME”],

“name”: “NUM_INDY_KEY”

}

],

“intervals”: [

“2015-04-01/2015-04-02”

]

}

the error i get is

2015-07-27T23:30:27,153 INFO [qtp1599641416-109] io.druid.server.QueryResource - null exception [87de155c-397f-4f6b-a526-c9db8aa5fbf5]

{

“error” : “null exception”

}

  • Then i tried the similar query using hyperUniqueCardinality But i get incorrect result. unique count should have been 5 but i get 0.

{

“queryType”: “groupBy”,

“dataSource”: “PCIRcvrPrnt”,

“granularity”: “all”,

“dimensions”: [

“PRNT_MERCH_NAME”

],

“filter”: {

“type”: “and”,

“fields”: [

{

“type”: “selector”,

“dimension”: “PRNT_MERCH_KEY”,

“value”: 3691

}

]

},

“aggregations”: [

{

“type”: “doubleSum”,

“fieldName”: “rcvr_12m_ntpv”,

“name”: “rcvr_12m_ntpv”

},

{

“type”: “doubleSum”,

“fieldName”: “rcvr_3m_ntpv”,

“name”: “rcvr_3m_ntpv”

},

{

“type”: “hyperUnique”,

“fieldName”: “INDY_KEY”,

“name”: “NUM_INDY_KEY”

}

],

“postAggregations”: [

{

“type”: “arithmetic”,

“name”: “uniqueInd”,

“fn”: “*”,

“fields”: [

{

“type”: “hyperUniqueCardinality”,

“fieldName”: “NUM_INDY_KEY”

},

{

“type”: “constant”,

Value": 1

}

]

}

],

“intervals”: [

“2015-04-01/2015-04-02”

]

}

**result: **

[ {

“version” : “v1”,

“timestamp” : “2015-04-01T00:00:00.000Z”,

“event” : {

“uniqueInd” : 0.0,

“rcvr_12m_ntpv” : 302190.6875,

“rcvr_3m_ntpv” : 60926.046875,

“NUM_INDY_KEY” : 0.0,

“PRNT_MERCH_NAME” : “GroupCard”

}

} ]

Regards

Karteek

Hi,

Can you please share your indexer json and relevent logs from broker/historical for the “null exception” stack trace?

– Himanshu

Himanshu,

Attached is my indexer task json and also the log generated when indexer task ran. i am not sure where to find stack trace generated by broker node.

Regrads

karteek

PCIRcvrPrnt.json (2.6 KB)

index_hadoop_PCIRcvrPrnt_2015-07-27T18%3A54%3A33.692Z.log (79.3 KB)

by mistake replied to karteek’s email id earlier…

Hi,

Yes, both cardinality and hyperUnique aggregator use “HyperLogLog” algorithm (https://en.wikipedia.org/wiki/HyperLogLog) which is based on approximation so it is expected to get 5.xxx or may be something different (but close to 5) from these aggregators.

For using cardinality aggregator, you would ingest the INDY_NAME column just as a dimension (so nothing in the metricSpec) and use the cardinality aggregator at the time of querying only.

– Himanshu

Himanshu,

i still get “error” : “null exception” for the cardinality query i am running. let me know what logs do i need to pull. below is the query.

{

“queryType”: “groupBy”,

“dataSource”: “PCIRcvrPrntUnq”,

“granularity”: “all”,

“dimensions”: [

“PRNT_MERCH_NAME”

],

“filter”: {

“type”: “and”,

“fields”: [

{

“type”: “selector”,

“dimension”: “PRNT_MERCH_KEY”,

“value”: 3691

}

]

},

“aggregations”: [

{

“type”: “doubleSum”,

“fieldName”: “rcvr_12m_ntpv”,

“name”: “rcvr_12m_ntpv”

},

{

“type”: “doubleSum”,

“fieldName”: “rcvr_3m_ntpv”,

“name”: “rcvr_3m_ntpv”

},

{

“type”: “cardinality”,

“fieldName”: “INDY_NAME”,

“name”: “NUM_INDY_KEY”

}

],

“intervals”: [

“2015-04-01/2015-04-02”

]

}

karteek

Hi Karteek,

I got the same problem, is it resolved by you ?

Thanks.

在 2015年7月31日星期五 UTC+8上午1:32:24,karteek chada写道:

Karteek, do you get teh same error if you run a topN?

Otherwise, a unit test to reproduce error would help.

Hi All,

Sorry for the late reply. For getting unique counts i am using hyperUnique and it solves my purpose. i haven’t tried cardinality in a while. I can give it a shot tomorrow and let you know.

Regards

Karteek