I wanted to know the differences between groupby and topN?
For small results sets (<1000) TopN is almost always a better choice. This limit can be adjusted. TopN is “approximate” and fast. GroupBy is exact and (currently) slow. A major difference comes in if you are doing grouping by more than one dimension, then you will need groupBy.
The approximate part of TopN comes from the fact that the intermediate results are truncated before being sent around. This causes the metric results and actual order to be approximate. In practice this only affects results when there are a large quantity of results which are nearly tied. The default truncation threshold on intermediate results is 1000 but can be raised or lowered.