Recommended configuration for Druid cluster

Hi,

I’ve been setting up a Druid cluster to ingest data from Kafka and I was wondering if there’s a recommended way to cluster the different types of Druid nodes.

Currently, my cluster is set up as follows (based on the documentation of the imply-1.3.0 package):

  • X nodes that each run Historical and MiddleManager processes

  • Y nodes that each run Overlord and Coordinator processes

  • Z nodes that each run a Broker process

Questions (related to performance tuning)

(a) Is it recommended to run Historical and MiddleManager processes on the same nodes? Or should they be separated?

(b) Same question - corresponding to Overlord and Coordinator processes?

© My understanding is that X is the only number that needs to be increased in order to scale

(i) Kafka real-time ingestion throughput AND

(ii) Query performance

Is that correct? Does Z also need to be increased in order to improve query performance?

(d) Would it be fine to set Y = 1?

Thanks,

Jithin

Hey Jithin, responses inline.

Hi,

I’ve been setting up a Druid cluster to ingest data from Kafka and I was wondering if there’s a recommended way to cluster the different types of Druid nodes.

Currently, my cluster is set up as follows (based on the documentation of the imply-1.3.0 package):

  • X nodes that each run Historical and MiddleManager processes
  • Y nodes that each run Overlord and Coordinator processes
  • Z nodes that each run a Broker process

Questions (related to performance tuning)

(a) Is it recommended to run Historical and MiddleManager processes on the same nodes? Or should they be separated?

For most use cases this works well. You can consider separating them and scaling them independently if you need to have very fine control over resource allocation.

(b) Same question - corresponding to Overlord and Coordinator processes?

The overlord and coordinator have relatively low resource requirements and can definitely be run on the same machine. You might consider having a second overlord/coordinator pair on another machine for high availability (if the primary overlord/coordinator fails, the secondary one will take over).

© My understanding is that X is the only number that needs to be increased in order to scale

(i) Kafka real-time ingestion throughput AND

(ii) Query performance

Is that correct? Does Z also need to be increased in order to improve query performance?

X is the most important node type to scale for query and ingestion performance. If you have a very high query load, you’ll also eventually need to increase Z. When exactly you’ll need to increase Z depends on your queries, but in general I think ratios on the order of 10:1 or higher aren’t unreasonable. Monitoring metrics from Druid will help here (http://druid.io/docs/0.9.1.1/operations/metrics.html). If you want high availability on the query side, you’ll want to have at least 2 brokers which can be put behind a load balancer.

(d) Would it be fine to set Y = 1?

Yes, but as above, for HA you’ll want at least 2.

Thanks David.

A few followup questions:

(e) By 10:1, were you referring to the ratio of X:Z?

(f) What’s the best way to make use of Druid metrics? I can see them being dumped in the log files. But that’s hard to consume.

Is there a GUI-based tool (eg: Grafana) that can display these metrics - that way it would be easier to spot any anomalies that may exist.

Thanks,

Jithin

Yes I was referring to X:Z.

The two most common metrics handling mechanism I’m aware of are:

  1. Using the Graphite extension to parse the metrics into Graphite format and visualize using Composer, Grafana, etc. See: http://druid.io/docs/0.9.1.1/development/extensions-contrib/graphite.html
  2. Using Druid itself to store Druid metrics. One way to do this is using the HTTP metrics emitter to write events to Tranquility Server to feed back into Druid (https://github.com/druid-io/tranquility/blob/master/docs/server.md).

Now that I’m thinking about it, an emitter that posts metrics to Kafka (say for re-consumption through the Kafka indexing service) might be interesting, but as far as I know doesn’t exist yet.

Thanks David.

I’ll try the Graphite extension since I already have a Graphite instance running.

–Jithin