hyperUnique returning 0.0

Hi ,

I have a small test dataset on which I am trying to get hyperUnique counts.

{“tp”:“102”,“eventCode”: “100001”, “visitor”: “b3968b73-6255-4354-ac82-1fb024263b96”, “visit”: “69ccd0bb-0025-41dd-9590-bd1881806cfc-1”, “ts”: 1416873602000, “rule”: “LKP_Page_TOP”, “queue”: “mano-Kitchen-Appliances-Desk”}

{“tp”:“102”,“eventCode”: “111111”, “visitor”: “undefined”, “visit”: “undefined”, “ts”:1416873702000, “rule”: “undefined”, “queue”: “undefined”}

{“tp”:“103”,“eventCode”: “500001”, “visitor”: “undefined”, “visit”: “undefined”, “ts”: 1416873802000, “rule”: “undefined”, “queue”: “undefined”}

{“tp”:“103”,“eventCode”: “400012”, “visitor”: “3308ff44-2de2-49ce-a037-22d6f0c03b5b”, “visit”: “3308ff44-2de2-49ce-a037-22d6f0c03b5b-2”, “ts”: 1416873902000, “rule”: “undefined”, “queue”: “Service-Email-Deflection”}

{“tp”:“104”,“eventCode”: “500001”, “visitor”: “undefined”, “visit”: “undefined”, “ts”: 1416874002000, “rule”: “undefined”, “queue”: “undefined”}

This was loaded using the file indextask.json

Running a query such as

“queryType”: “groupBy”,

“dataSource”: “oe2”,

“granularity”: “day”,

“dimensions”: ,

“aggregations”: [

{ “type”: “count”, “name”: “count” },

{“type”: “hyperUnique”, “name”: “visitorssu”, “fieldNames”: “visitor”},

{“type”: “hyperUnique”, “name”: “visitu”, “fieldNames”: “visit”},

{“type”: “cardinality”, “name”: “distinct_tps”, “fieldNames”:[“tp”] },

{“type”: “cardinality”, “name”: “distinct_events”, “fieldNames”:[“eventCode”] }

],

“intervals”: [ “2014-11-25T00:00:00.000/2014-11-26T00:00:00.000” ]

gives an output of

{

“version” : “v1”,

“timestamp” : “2014-11-25T00:00:00.000Z”,

“event” : {

“distinct_events” : 4.003911343725148,

“distinct_tps” : 3.0021994137521975,

“count” : 5,

"visitorssu" : 0.0,

"visitu" : 0.0

}

Why does the hyperUniques return 0.0 . ? Is there anything wrong with my configuration?

The use case for hyperUnique is an (approximate) high cardinality distinct visitor count to a website

I am using 0.8.2 druid version.

Thanks in advance

Manohar

indextask.json (1.11 KB)

I also tried

“queryType”: “groupBy”,

“dataSource”: “oe2”,

“granularity”: “day”,

“dimensions”: ,

“aggregations”: [

{ “type”: “count”, “name”: “count” },

****{“type”: “hyperUnique”, “name”: “visitorssu”, “fieldNames”: “visitoru”},

{“type”: “hyperUnique”, “name”: “visitu”, “fieldNames”: “visitu”},

where the fieldName in the queryApi refers to the metricName of the indextask.json.

Both results return 0.0 for the hyperCount result

Hey Manohar,

For the field that you indexed as {“type” : “hyperUnique”, “name” : “visitoru”, “fieldName” : “visitor”}, you should query it with:

{

“type” : “hyperUnique”,

“name” : “visitoru”,

“fieldName” : “visitoru”

}

The idea is that the indexing spec read a field called “visitor” and created an HLL column called “visitoru”. At query time you want to read the “visitoru” column.

Thanks Glan,

This worked

{“type”: “hyperUnique”, “name”: “visitoru”, “fieldName”: “visitoru”},

{“type”: “hyperUnique”, “name”: “visitu”, “fieldName”: “visitu”},

I think I was using typo “fieldNames” instead of fieldName

Sorry , should have noticed this myself.

Below response

“event” : {

“distinct_events” : 4.003911343725148,

“visitoru” : 3.0021994137521975,

“distinct_tps” : 3.0021994137521975,

“hihhcount” : 5,

“visitu” : 3.0021994137521975

}

Thanks and Regards

Manohar