How could I properly set MiddleManager?

In production cluster configuration (http://druid.io/docs/latest/configuration/production-cluster.html), in the r3.8xlarge machine MiddleManager configuration is below.

druid.worker.capacity=9

druid.indexer.fork.property.druid.processing.numThreads=2

``

Realtime Node Configuration (http://druid.io/docs/latest/configuration/realtime.html) annouced that default processing.numThreads is number of cores - 1 (or 1).

MiddleManager Configs(http://druid.io/docs/latest/configuration/indexing-service.html) says that default worker.capacity is number of available processors - 1.

According to Realtime Node Configuration and MiddleManager Configs, druid.worker.capacity and processing.numThread should be respectively 31 and 31 in MIddleManager production cluster configuration because r3.8xlarge has 32 cores.

So, I’m confused.

How could I properly set druid.worker.capacity and processing.numThread MIddleManager?

My machine spec is 24cores. (6core * 2socket = 12core * 2 HT)

The nodes that are responsible for coordination (Coordinator and Overlord nodes) require much less processing.

``

This configuration is an example of what a production cluster could look like. Many other hardware combinations are possible! Cheaper hardware is absolutely possible.

``

We personally have r3.8xlarge in production but only use 8 workers because we use Hadoop Indexing service, so we don’t need too much ressources for it.

Depending on how you index your data actually.

Thank you for your help.

In my case, it is properly configuration that capacity and proccess thread num are both 23, right?

2016년 8월 2일 화요일 오후 10시 24분 57초 UTC+9, Benjamin Angelaud 님의 말:

Hi,
A Realtime Node can handle ingesting multiple datasources and you generally run single Realtime Node with processing threads = num of cores - 1.

For peons, they handle 1 DataSource each and you run multiple peons on a single machine,

a simple general rule there can be worker.capacity * (processing threads per task + 1 thread for ingestion) ~= no. of cores.

In your case with 24 cores, I would suggest setting worker.capacity = 8 and num of processing threads = 2 as an initial start and tune things further from there depending of ingestion and query rates.

Thanks a lot!!

2016년 8월 2일 화요일 오후 6시 23분 3초 UTC+9, Hwansung Yu 님의 말: