Broker numConnections and numThreads

Hi,
Our brokers did not handle queries. And we changed some segments’ “used” to “0” in MySQL, as the segments decrease the brokers come back to work.

We use Kafka Indexing Service to ingest realtime data, and the segmentGranularity is 1H. Thus our segments are very small, 300KB ~ 50MB. I wander the number of segments affect the concurrency of the Broker.

So is there any recommended configuration of Broker numConnections and numThreads?

We have 2 overlords, 2 coordinators, 19 middleManagers, 8 historicals, 4 brokers(4 brokers in each physical machines, thus 16 brokers together).

Our config of Broker is as follows:

druid.broker.balancer.type=connectionCount

# HTTP server threads

druid.broker.http.numConnections=50

druid.server.http.numThreads=100

druid.broker.http.readTimeout=PT1M

druid.broker.retryPolicy.numTries=1

# Processing threads and buffers

druid.processing.buffer.sizeBytes=536870912

druid.processing.numThreads=20

# Query cache

druid.broker.cache.useCache=true

druid.broker.cache.populateCache=true

druid.cache.type=local

druid.cache.sizeInBytes=1200000000

Hi Xinxin,

A good starting point for recommended settings is the production cluster configuration page here: http://druid.io/docs/0.9.1.1/configuration/production-cluster.html

When you say ‘our brokers did not handle queries’, do you mean that all your queries were failing? Were there any exceptions in the logs of the query-serving processes when this happened (broker, historical, peon)?

If you’re having performance issues (which may possibly be because of an excessive number of small segments), the best way to proceed would be to enable metrics so that you can figure out where the issues are happening. This page has more details: http://druid.io/docs/0.9.1.1/operations/metrics.html. Enabling metrics is described on the common config page here: http://druid.io/docs/0.9.1.1/configuration/index.html.

You will be getting very low cache utilization by running 16 brokers, all using local cache. I’d consider running fewer brokers, e.g. 1 per host, and scaling up their heap and cache allotment accordingly.

I doubt numConnections/numThreads is your issue, unless you have a very large query volume or very long running queries.

Hi David Lim,

Thank you so much to help me.

We finally found the problem is the query with javaScript function. Since our system is very sensitive to latency and queries need realtime processing. The query with javaScript function would not release resource immediately, and it would take 10 minutes or longer to back to normal performance. So we changed the query with no javaScript function.

And the next version could consider javaScript optimization, as some queries have to use javaScript function.

Xinxin

在 2016年10月27日星期四 UTC+8上午7:44:55,David Lim写道:

Hi Max Lord,

Thank you so much to help me.

We run 1 broker per host, and set the broker config like bellow

  • druid.broker.http.numConnections=300
  • druid.server.http.numThreads=50
  • druid.processing.numThreads=60
  • druid.broker.cache.useCache=false
  • druid.broker.cache.populateCache=false
    The results turn out much better.

Xinxin

在 2016年10月28日星期五 UTC+8上午1:24:57,Max Lord写道: