Hi Druid expert
We have used Druid for PV and UV computing for some days. With more and more users load data into Druid, we met the trouble that “bad performance on UV computing with group by clause”
we have a druid cluster with 20 nodes. There is one datasource for UV and PV computing. Each hourly interval has 2million rows under 54 shard.
Druid takes less than 0.4s for PV computing.
However, when we query UV with group by clause, druid takes 12 seconds for the query. When we query between more intervals, druid takes more time.
But if we convert “filter xx in [A,B,C] groupby xx” **into 3 query ------ Select uv for A , Select uv for B And Select UV for C, each query takes less than 0.4s. **
PS. We compute UV with DataSketch. Size=16384
Question: Why the DataSketch UV calculation with groupby clause is slow??
Thanks a lot if there is any suggestion?