Druid 0.13.4 - GroupBy Query Broker Bottleneck

Hey Folks,

Need some help with improving query latency of a groupBy query. On Druid 0.13.4 using GroupBy V2.

Query :

{

"aggregations": [

    {

        "fieldName": "event_id_hll",

        "lgK": 13,

        "name": "count",

        "type": "HLLSketchMerge"

    }

],

"dataSource": "exp",

"dimensions": [

    "variant",

    "event_name",

    "experiment"

],

"filter": {

    "field": {

        "dimension": "event_name", ....

    },

    "type": "not"

},

"granularity": "all",

"intervals": [

    "2019/2021"

],

"queryType": "groupBy"

}

Above query is run every 5 mins.

Schema of the table :

Dimensions: experiment, variant, event_name

Metrics : event_id_hll (HLLSketchBuild)

Post Compaction : Daily Segments : each segment approx 4 GB (1 partition). Since all of time is queried every 5 mins, thought it would be ideal to have lesser num of segments/partitions.

Metrics Captured

broker.query.time= 1.4 mins

historical.query.time=9secs

broker.query.node.ttfb.timer.p99 ~ 12.5 secs

Broker.query.node.time ~ 12.9 secs

Query Result Set Size : Approx 7MB

Cluster Setup

**40 Historicals/MiddleManagers ( Co-located) **

Historical runtime :

druid.server.http.numThreads=50

druid.processing.buffer.sizeBytes=2146435072

druid.processing.numThreads=23

druid.segmentCache.locations=[…]

druid.server.maxSize=2040109465600

historical cache

druid.historical.cache.useCache=true

druid.historical.cache.populateCache=true

druid.cache.type=caffeine

druid.cache.sizeInBytes=2147483647

3 Brokers

Broker runtime :

druid.query.groupBy.maxMergingDictionarySize=268435456

druid.broker.http.numConnections=20

druid.server.http.numThreads=50

druid.broker.http.readTimeout=PT5M

Processing threads and buffers

druid.processing.buffer.sizeBytes=2146435072

druid.processing.numThreads=31

druid.processing.numMergeBuffers=15

Looks like the N-way merge on the Broker is the bottleneck. Is there any way to optimize this merge? No broker Caching enabled yet. Does group V2 do internal chunking of queries based on intervals ? If yes then caching might help & as we can avoid recomputing the counts for the older days ?

Hi Sharanya,

A couple of comments.

  1. The general recommendation for size of a Druid segment is to limit the segment to either 5 million rows per segment (after roll up) or 300-700MB (whichever limit is hit first).

  2. Are any of the dimensions multivalued? Note that a group by query explodes the data when doing a group by on multi-valued dimensions. More details here - https://druid.apache.org/docs/latest/querying/multi-value-dimensions.html

If it is multi valued, using a filteredDimensionSpec might help you out. https://druid.apache.org/docs/latest/querying/dimensionspecs.html#filtered-dimensionspecs

A couple of things

  1. Post Compaction : Daily Segments : each segment approx 4 GB (1 partition).

A segment size of 4 GB is higher. From the docs “For Druid to operate well under heavy query load, it is important for the segment file size to be within the recommended range of 300mb-700mb.”

  1. How does your machine configurations for Historicals and Brokers look like ? - RAM, CPU cores, disk etc
  1. No multivalue dims.

  2. Will try out the ideal segment size & check if there is any improvement.

  3. Both historical & broker nodes : r5.12xL : 48 VCPUs, 384 Gib RAM

"partitionsSpec" : {
        "type" : "hashed",
        "targetPartitionSize" : 5000000,
        "maxPartitionSize" : 7500000,
        "assumeGrouped" : false,
        "numShards" : -1,
        "partitionDimensions" : [ ]
      },

yielded a single partition of size 3.4 GB. Perhaps the metadata for the HLL Sketch is what is making the segment so bulky ?

Are you using batch ingestion? With hashed partitioning with no partitionDimensions, the maxPartitionSize setting is ignored.
With 5 million rows hitting 3.4GB, try setting targetPartitionSize to 800K to achieve shard size ~ 600MB.

Also how many historical nodes are there in the cluster? On the broker, the setting for druid.server.http.numThreads at 50 seems to be a bit low for 48 vCPU box. The default value is max(10, (Number of cores * 17) / 16 + 2) + 30 . Try bumping it to 100 on the broker.