TopN query order by cardinality is very slow

Hi all,

I have been testing the new sketches recently and found that if we combine this metrics to TopN query will cause the query extremely slow. Our datasets contains about ~6M rows and the dimension we set for TopN query has about ~3M cardinality. I tried TopN query order by another simple longSum metrics and it will return really fast. But when i change the longSum to thetaSketch, it becomes really slow and timeout (5min).

Any suggestions to help with the performance? Thanks.

Hi,

are you ingesting using thetaSketch as well ?
can you paste your metric spec used at ingestion time and the aggregator used at query time?
Is groupBy query for same equally bad?

there can be many reasons for slowness

  • thrashing at historicals
  • using thetaSketch only at query time and not at ingestion time
  • too many sketches to be merged

– Himanshu

It would be good to know the number of buckets being used, as I suspect that you guys are using very large bucket sizes that will require some thought around resources and partitioning.

Kurt Young,
Can you check the CPU usage of the historical nodes when query is running?

And also can you let us know the “segment/scan/time” value for both the type of queries when fired over same interval?

I have observed similar behaviour with HLL while trying topN and the reason was these queries are very cpu intensive.

Thanks

Rohit