Cardinality

Hi,
As per my requirement i need to generate a report for last 30 days with of number of users for different types for action for each day.I didn’t go for cardinality since it was supported only for time-series quires, but using this i would have to make individual calls for each action. Instead i used group by(on user identity and action with granularity as one day) query and aggregate on count. This result was parsed and got the no of users for each action for each day. But over time this query became really heavy and the queries started timing out.

So i switched back to cardinality and timeseries query,but the count i receive now is different and the deviation is approximately 30-40%. Is there anything i need to do to tweak the accuracy of cardinality. Please help on this.

-Suresh

Hey Suresh,

You can actually use the cardinality aggregator with other kinds of queries. It should work to do a groupBy or topN of action with a cardinality(users) aggregator.

The cardinality aggregator is approximate but the errors should not be as bad as that. The expected error is ~3% give or take a bit. Can you confirm what query you were using for the groupBy of user and action, and what query you’re using with the cardinality aggregator, just to make sure they match?

Hi Gian,

Please see below query details.

Groupby - We are filtering based on customer and querying with two different actionType values

Hey Suresh,

How are you computing the number of unique users for an action using the first query? Are you using the value of “actions” from that “count” aggregator somehow?

Hi Gian,

No i am not using the result of count aggregation. I get the result events will be separated by subscriber Id, date and action.I maintained a map for day and action type and on parsing the result for each subscriber entry for a day and action i increment the entry in this map,this way i got the count of distinct users.

-Suresh

I’m having trouble understanding what you are doing. Can you include the Druid query?