Most of my workers are idle. Why can I only index one task at at time?

I’m trying to get a
POC cluster up and running and I’m having a lot of trouble getting
trying to get batch ingestion running in with multiple workers.

I have one Overlord and 0ne middlemanger set up and I’ve tried to configure the middleManager to run multiple peons/workers, but for some reason I’m only ever able to get one to pick up tasks. Attached are screenshots of the overlord console showing that I have capacity for 9 workers, but only one being used. The configuration files I’m using are below.

Can anyone help me out with what I’m doing wrong? Why can’t I get the other 8 workers to do work?

Thanks in advance for any help.

Overlord (c4.large - CPU 4, MEM 7.5 GB):

Overlord
runtime.properties

Default host:

localhost. Default port: 8090. If you run each node type on its own
node in production, you should override these values to be IP:8080

druid.host=10.0.1.13

druid.port=8080

druid.service=overlord

Uncomment

following property if you are running Middle Manager

druid.indexer.runner.type=remote

Upload all task

logs to deep storage

druid.indexer.logs.type=s3

druid.indexer.logs.s3Bucket=com.marchex.druid

druid.indexer.logs.s3Prefix=prod/logs/v1

druid.indexer.storage.type=metadata

druid.selectors.indexing.serviceName=overlord

Overlord
common.runtime.properties

druid.extensions.coordinates=[“io.druid.extensions:druid-s3-extensions”,
“io.druid.extensions:postgresql-metadata-storage”]

Zookeeper

druid.zk.service.host=10.0.1.73

Query Cache (we use a simple 10mb heap-based local cache on the

broker)

druid.cache.type=local

druid.cache.sizeInBytes=10000000

Indexing service discovery

druid.selectors.indexing.serviceName=overlord

Metrics logging (disabled for examples - change this to logging or

http in production)

druid.emitter=logging

druid.indexer.logs.directory=/site/druid/logs

druid.metadata.storage.type=postgresql

druid.metadata.storage.connector.connectURI=jdbc:postgresql://10.0.1.180/druid

druid.metadata.storage.connector.user=druid

druid.metadata.storage.connector.password=xxxxx

druid.storage.type=s3

druid.s3.accessKey=XXXXX

druid.s3.secretKey=XXXXX

druid.storage.bucket=com.druid.test

druid.storage.baseKey=druid_

Overlord JVM Flags:

-Xmx3g -XX:+UseConcMarkSweepGC -Duser.timezone=UTC
-Dfile.encoding=UTF-8
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
-Djava.io.tmpdir=/tmp"

Middle Manager (m4.4xlarge - CPU 16, MEM 64GB):

Middle Manager
runtime.properties:

druid.host=10.0.1.71

druid.port=8080

druid.service=druid/prod/middlemanager

Store task logs in deep storage

druid.indexer.logs.type=s3

druid.indexer.logs.s3Bucket=com.marchex.druid

druid.indexer.logs.s3Prefix=prod/logs/v1

Resources for peons

druid.indexer.runner.javaOpts=-server -Xmx3g -XX:+UseG1GC
-XX:MaxGCPauseMillis=100 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps

druid.indexer.task.baseTaskDir=/site/druid/persistent/task/

Peon properties

druid.indexer.fork.property.druid.monitoring.monitors=[“com.metamx.metrics.JvmMonitor”]

druid.indexer.fork.property.druid.processing.buffer.sizeBytes=536870912

druid.indexer.fork.property.druid.processing.numThreads=4

druid.indexer.fork.property.druid.segmentCache.locations=[{“path”:
“/site/druid/persistent/zk_druid”, “maxSize”: 0}]

druid.indexer.fork.property.druid.server.http.numThreads=50

druid.indexer.fork.property.druid.storage.baseKey=prod/v1

druid.indexer.fork.property.druid.storage.bucket=com.marchex.druid

druid.indexer.fork.property.druid.storage.type=s3

druid.worker.capacity=9

druid.worker.ip=10.0.1.71

druid.worker.version=0

druid.selectors.indexing.serviceName=overlord

Middle Manager
common.runtime.properties:

druid.extensions.coordinates=[“io.druid.extensions:druid-s3-extensions”,
“io.druid.extensions:postgresql-metadata-storage”]

druid.extensions.localRepository=extensions-repo

Zookeeper

druid.zk.service.host=10.0.1.73

Deep storage (local filesystem for examples - don’t use this in

production)

druid.storage.type=local

druid.storage.storageDirectory=/site/druid/localStorage

Query Cache (we use a simple 10mb heap-based local cache on the

broker)

druid.cache.type=local

druid.cache.sizeInBytes=10000000

Indexing service discovery

druid.selectors.indexing.serviceName=overlord

Monitoring (disabled for examples, if you enable SysMonitor, make

sure to include sigar jar in your cp)

Hi Chris,

The default configurations will send all tasks to a middle manager until it is at capacity. It is designed this way to for autoscaling, such that excess middle managers can be taken down when they are not needed. To change this, you can follow the instructions here:

http://druid.io/docs/latest/configuration/indexing-service.html

Search for “selectStrategy”. The value needs to be changed to “equalDistribution”

Hey Chris,

One of those screenshots shows you have a lot of tasks in the “waiting on locks” category. That probably means they overlap in which intervals they are ingesting data for (indexing locks are a per-dataSource, per-interval thing). You can run parallel indexing tasks if you split the data up by time and run one index task per chunk of time.