[0.9.1.1] middleManagers not using full worker capacity

Hi,

I’ve seen a couple of threads about this issue but could not find any resolution… We’ve been running with just 2 middle managers with one worker each. I just started experimenting with adding additional middle managers that have higher worker capacity. I’m finding that the additional worker capacity is being used inconsistently if at all.

I haven’t found a good pattern or repro case for this, but I have brought the new middle managers up originally configured with capacity = 1 (because the AWS instance is using the default config). Then I shut it down and change the config and restart. The overlord never seems to use the additional capacity right away, assigning only a single task to the restarted middle manager at first. After some time it has inconsistently started using the additional capacity. I saw one with capacity 3 reach full capacity for a while, but then after completing a few tasks (not sure how many, but probably not more than 3 or 4) it’s back down to a single task again.

Right now for example I’m seeing (currCapacityUsed/workerCapacity): 1/1, 1/1, 1/3, 2/2. Whoops, that last one just flipped to 1/2 as I was composing this post!

I understand about tasks waiting for locks but that is not the problem, I have 40+ pending tasks and zero waiting tasks. (Our etl checks for duplicates before submitting new tasks so we rarely have any waiting tasks.)

The overlord console is showing the configured worker capacity correctly in its “Remote Workers” section.

I was originally running with nothing set for selectStrategy. I tried setting it to “fillCapacity” and “equalDistribution” with no apparent effect.

I looked through an overlord log and nothing jumped out as obviously wrong (but I don’t really know what to look for).

My configs are

common

druid.extensions.loadList=[“druid-s3-extensions”, “postgresql-metadata-storage”, “druid-namespace-lookup”, “graphite-emitter”]

druid.cache.type=local

druid.cache.sizeInBytes=100000000

druid.selectors.indexing.serviceName=druid:overlord

druid.selectors.coordinator.serviceName=druid:coordinator

druid.monitoring.monitors=[“com.metamx.metrics.JvmMonitor”]

druid.emitter=graphite

druid.emitter.graphite.hostname=localhost

druid.emitter.graphite.port=2014

druid.emitter.graphite.eventConverter={“type”:“whiteList”, …}

druid.emitter.graphite.alertEmitters=[“logging”]

druid.metadata.storage.type=postgresql

druid.metadata.storage.connector.connectURI=…

druid.metadata.storage.connector.user=druid

druid.metadata.storage.connector.password=…

druid.storage.type=s3

druid.s3.accessKey=…

druid.s3.secretKey=…

druid.storage.bucket=…

druid.storage.baseKey=…

#Overlord

druid.port=8080

druid.service=druid/overlord

druid.indexer.queue.startDelay=PT1M

druid.indexer.fork.property.druid.processing.numThreads=1

druid.indexer.fork.property.druid.computation.buffer.size=100000000

druid.indexer.runner.type=remote

druid.indexer.logs.type=s3

druid.indexer.logs.s3Bucket=#

druid.indexer.logs.s3Prefix=#

druid.indexer.storage.type=metadata

#Middle Manager

druid.service=druid/middlemanager

druid.indexer.logs.type=s3

druid.indexer.logs.s3Bucket=…

druid.indexer.logs.s3Prefix=…

druid.indexer.runner.javaOpts=-server -Xmx6g -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -Dlog4j.configurationFile=…/peonLog4j2.xml

druid.indexer.task.baseTaskDir=…

druid.indexer.fork.property.druid.monitoring.monitors=[“com.metamx.metrics.JvmMonitor”]

druid.indexer.fork.property.druid.selectors.indexing.serviceName=druid:overlord

druid.indexer.fork.property.druid.processing.buffer.sizeBytes=536870912

druid.indexer.fork.property.druid.segmentCache.locations=[{“path”: “/site/druid/persistent/zk_druid”, “maxSize”: 0}]

druid.indexer.fork.property.druid.storage.archiveBaseKey=…

druid.indexer.fork.property.druid.storage.archiveBucket=…

druid.indexer.fork.property.druid.storage.baseKey=ci-druid/prod/v1

druid.indexer.fork.property.druid.storage.bucket=…

druid.indexer.fork.property.druid.storage.type=s3

druid.worker.capacity=3 # or 1, or 2

I think I might have figured this out. I did not fully apply the advice that Gian provided in an earlier post - https://groups.google.com/forum/#!topic/druid-user/OwEDpa1gEOc - to allot somewhere between 1 and 2 cores per worker. I was trying to put 3 workers on a 4 core machine which is possibly too close to 1 core per worker. I wonder if some threads ended up deadlocked blocking communication with the overlord? Also possibly significant, I had no setting for processing.numThreads, so it defaulted to 3 (CPUs - 1). It’s not clear to me if these threads would actually be doing anything - they are just for serving queries, right, which we are not doing from our peons - but still.

I’ve recently run a 20 workers on a 40 CPU and had no problems with capacity being fully used.

The default strategy is that tasks are assigned to a single worker until it is full. You can change this behavior with configuration: http://druid.io/docs/0.9.1.1/configuration/indexing-service.html (search worker select strategy) to get more even distribution of tasks across all your workers.