Can I tuning query performance about event count groupBy user id?

Hi. I’m trying to tune query about groupBy query.
What I want to query is how many user are on service who created specific event more than n times in time range.

As I know, I need to count event group by user id, than get cardinality of that.

In this case, HyperUnique metrics that I indexed in ingestion time were not helpful.

I found counting event group by user id take long time.

There are more than billions events and result cardinality is usally more thant millions.

Is there any way to tuning query?

Or should I need to scale up historical node?

I found groupBy/query/wait/time metric is very high. It looks like loading data from segment takes long time.

2019년 1월 10일 목요일 오전 2시 3분 31초 UTC+9, toughro...@gmail.com 님의 말:

How is your deep storage and historical servers configured?

Rommel Garcia

Director, Field Engineering

Deep storage is S3. And I checked historical node pulled segments from s3 to local disk.

Here is important runtime properties:

druid.server.http.numThreads=25

druid.processing.buffer.sizeBytes=536870912

druid.processing.numThreads=7

druid.query.groupBy.maxMergingDictionarySize=1000000000

and jvm options:

-server

-Xms8g

-Xmx8g

-XX:MaxDirectMemorySize=8g

-Duser.timezone=UTC

-Dfile.encoding=UTF-8

-Djava.io.tmpdir=/tmp/druid

-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

Almost segment data looks like cached on memory(disk cache).

I’m using ec2 r5.xlarge instance for historical node and there are 3 nodes.

2019년 1월 11일 금요일 오후 9시 42분 46초 UTC+9, Rommel Garcia 님의 말: