Slow hyperUnique performance

Let me preface this by saying that this aggregator is still very fast compared to traditional datastores, so don’t take this mail as me kvetching in ignorance. However, it’s drastically slower than the other aggregators. For an example dataset, we have about 2200 segments comprising 1,213,790,643 records. This query runs in about 1-2 seconds:

{

“intervals”: [

"2015-11-01T04:00:00\/2016-02-01T05:00:00"

],

“metric”: “count”,

“granularity”: {

"period": "P1M",

"timeZone": "Etc\/UTC",

"type": "period"

},

“threshold”: 10,

“dataSource”: “requests”,

“dimension”: “site”,

“aggregations”: [

{

  "fieldName": "count",

  "name": "count",

  "type": "count"

}

],

“queryType”: “topN”

}

This query runs in about 10-20 seconds:

{

“intervals”: [

"2015-11-01T04:00:00\/2016-02-01T05:00:00"

],

“metric”: “unique_ips”,

“granularity”: {

"period": "P1M",

"timeZone": "Etc\/UTC",

"type": "period"

},

“threshold”: 10,

“dataSource”: “requests”,

“dimension”: “site”,

“aggregations”: [

{

  "fieldName": "unique_ips",

  "name": "unique_ips",

  "type": "hyperUnique"

}

],

“queryType”: “topN”

}

I understand that the hyperUnique aggregator is definitely doing more than the count aggregator. Still, I’m curious if there is any way to speed the hyperUnique query up because it’s running about 10-20 times slower. If it helps, we’ve got 4 historical nodes each with the following config:

druid.processing.buffer.sizeBytes=1000000000

druid.processing.numThreads=8

druid.segmentCache.locations=[{“path”: “/data/druid/index/historical”, “maxSize”: 85000000000}]

druid.server.maxSize=85000000000

Any suggestions?

Hi Taylor, is there any chance you can change the metric in the topN to the non-HLL metric? For example, the top counts or something, but still include the hyperUnique as a parameter. If there is, there’s a few optimizations that can be done.

We’re gathering the total number of unique visitors for a given timeseries, so I don’t think so. Could you elaborate?

Also, how could I refer to a metric in a topN query if the actual “metric” field is set to something else? Should I treat it as a dimension?

Hi Taylor, I was thinking about optimizations described here: https://github.com/druid-io/druid/pull/2222

That optimization appears to bring it down to around 9 seconds after a restart of the broker, which is definitely nice. Is this performance about what one should expect when using HyperLogLog aggregators? If it’s relevant, our cardinality isn’t that high, it’s generally under 10,000.

I think to understand how to improve performance, it is first important to understand where the bottleneck is. For example, if a single segment is taking several seconds for a topN of a hyperUnique, then you may consider partitioning your segments more to get better parallelization.

How would I determine that? Is there any metric emitted by the historical nodes?

http://druid.io/docs/latest/operations/metrics.html

druid/segment/time

It looks like there’s going to be way too much data here. It looks like it’s emitting metrics for every single segment and we have thousands of them on each historical node.

Qualitatively, it looks like the values for the query/segment/time metric range from 1 to 1000. There’s quite a bit of variance in the hundred or so events of that type I read through.

Hmmm. The Druid metrics are designed to be ingested in a system where you can view aggregated totals and averages, and be able to slice and dice across various params. For example, if you ingested these metrics back into Druid, you should be able to filter on only HLL queries. You can also set a “queryID” in the query context, and trace the metrics related to that query by finding all instances of that query ID across various historicals.

I’ve written a blog post before about how to monitor Druid that may be interesting:

So, it doesn’t appear that any particular segment is taking an exceptionally long time. There are a few segments that are taking upwards of a second to complete though. Any suggestions?

Yeah, I think you can look into creating smaller segments with less rows and parallelizing the query more. How big are the segments now? There are many things that can be done to improve performance, but it gets into a much longer discussion based on your particular setup.

Some common ones:

  • merge at historicals, not broker

  • better memory/disk ratio

  • tune for your particular hardware

When you say “merge at historicals,” I’m not sure what you mean. There is some documentation for merging small segments together at http://druid.io/docs/latest/configuration/coordinator.html but it doesn’t seem to mention what you are.

As for hardware tuning, that’s certainly a possibility but I think you’d have to tell me what to tune. We have 4 historical nodes each with 8CPUs, 24GB of RAM, and 100GB of diskspace and the following config:

Default host: localhost. Default port: 8083. If you run each node type on its own node in production, you should override these values to be IP:8080

druid.host=sjc-wapl-stats4

druid.port=8083

druid.service=historical

Our intermediate buffer is also very small so longer topNs will be slow.

In prod: set sizeBytes = 512mb

druid.processing.buffer.sizeBytes=1000000000

We can only 1 scan segment in parallel with these configs.

In prod: set numThreads = # cores - 1

druid.processing.numThreads=8

maxSize should reflect the performance you want.

Druid memory maps segments.

memory_for_segments = total_memory - heap_size - (processing.buffer.sizeBytes * (processing.numThreads+1)) - JVM overhead (~1G)

The greater the memory/disk ratio, the better performance you should see

druid.segmentCache.locations=[{“path”: “/data/druid/index/historical”, “maxSize”: 85000000000}]

druid.server.maxSize=85000000000

I would also assume that more historical nodes would improve performance as well. We currently have about 30GB compressed/120GB uncompressed data in this datasource with about 2.2 billion records, so doing a hyperUnique is always going to involve the read of a ton of records.

“merge at historicals” - http://druid.io/docs/0.9.0-rc1/querying/caching.html

We can certainly give that a shot, but just qualitatively it seems like that wouldn’t help much. I can see the CPU usage spike on the historical nodes right up until the query returns. It doesn’t seem like the broker is the bottleneck here. Any suggestion on the number of historical nodes we should have with a dataset of our size? Is 4 too few?

historicals isn’t really important as number of cores

more cores == scan more segments in parallel

if HLL is slow, creating smaller segments and parallelizing work should help

also, folks are adding a very fast distinct count: https://groups.google.com/forum/#!topic/druid-development/GXRpXfBzfJs