what factors affect broker performance

The recommended size of a production broker node is very high.
It seems to be almost the same as the size of a historical node.

Since we have been running druid with large historicals and small broker, I am curious what factors affect broker performance.

It seems most of the computation is run on historical nodes.

If all the broker nodes do is combine data, then does the broker size depend on:

  1. number of historical nodes

  2. number of data points in the final result

my assumption here is:

  1. increasing the number of historicals increases the amount of data to be combined

  2. more data points = more memory needed to hold them

however, the caveat here is:

  1. given a query, the same data is returned no matter the number of historicals, so the amount of historicals shouldnt affect the broker size.

  2. does time range play a factor too?
    if i query for 90 days of data and then query for 1 day of data, and in my chart I am going to show only 60 points (aggregated of course), will that affect the broker requirements?

Hi, see inline.

The recommended size of a production broker node is very high.
It seems to be almost the same as the size of a historical node.

These settings assume you’ll be using Druid in a multi-tenant environment where many users may be concurrently accessing the system.

Since we have been running druid with large historicals and small broker, I am curious what factors affect broker performance.

It seems most of the computation is run on historical nodes.

If all the broker nodes do is combine data, then does the broker size depend on:

  1. number of historical nodes
  1. number of data points in the final result

The number of data points to merge affects broker performance, and so does the number of queries. The number of historicals and number of segments impact broker resources as the broker does have to know about the state of the world and store that information in memory, but the main performance bottlenecks we see at the broker level are from too many queries or too many results to merge.

my assumption here is:

  1. increasing the number of historicals increases the amount of data to be combined
  1. more data points = more memory needed to hold them

however, the caveat here is:

  1. given a query, the same data is returned no matter the number of historicals, so the amount of historicals shouldnt affect the broker size.
  1. does time range play a factor too?
    if i query for 90 days of data and then query for 1 day of data, and in my chart I am going to show only 60 points (aggregated of course), will that affect the broker requirements?

The interval of the query can impact performance as longer intervals generally mean more segments to scan and more results to merge. Of course, you can always have more results to merge when querying a day then you do when querying 90 days.

How will one know if merging on broker nodes is the bottleneck in query speed? What are the signs to look for? High cpu usage?

We generally look at query metrics to diagnose this:

https://docs.google.com/spreadsheets/d/15XxGrGv2ggdt4SnCoIsckqUNJBZ9ludyhC9hNQa3l-M/edit#gid=0

We look at the metrics for a particular query and compare the request times of the broker versus those across historical nodes.

fangjin,
i am curious, do you guys use some tool, (eg splunk) to parse through the logs or just manual grepping?

i am trying to get the best way to visualize the druid logs

Hi Prashant,

We use http emitter to send druid metrics to a web service, from there we ingest them into a separate druid cluster.

To debug any issues we usually run queries over this druid metrics cluster.