MiddleManager/Peons spawning too many threads

Hi all,

I am having trouble with my middle manager node running out of available threads. I have 8 tasks running on a 32 core CentOS 7.3 system with the following config:

druid.service=druid/middlemanager

druid.port=8091

Number of tasks per middleManager

druid.worker.capacity=9

Task launch parameters

druid.indexer.runner.javaOpts=-server -Xmx3g -XX:+HeapDumpOnOutOfMemoryError -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -Dhadoop.mapreduce.job.user.classpath.first=true

druid.indexer.task.baseTaskDir=/services/druid/data2/task

druid.indexer.task.restoreTasksOnRestart=true

druid.indexer.runner.javaOptsArray=["-XX:OnOutOfMemoryError=kill -9 %p"]

HTTP server threads

druid.server.http.numThreads=50

Processing threads and buffers

druid.processing.buffer.sizeBytes=536870912

druid.processing.numThreads=2

Hadoop indexing

druid.indexer.task.hadoopWorkingPath=/services/druid/data2/hadoop-tmp

druid.indexer.task.defaultHadoopCoordinates=[“org.apache.hadoop:hadoop-client:2.6.0-cdh5.7.0”]

My peons are failing to start or failing because they run out of threads. I have checked and there are only 8 java processes started (ps aux | grep middleManager | wc -l) however there are 2020 threads (ps -eLf | grep middleManager | wc -l). From the above I would on the order of 52 (and maybe a couple of overhead threads) per peon. That would be on the order of 400-450 threads not 2000+. What am I missing?

Thanks,

–Ben

Hey Ben,

The expected no. of overhead threads is more like 30 than 2 (a few for ZK, a few for monitoring, a few for emitting, a few for the JVM itself, a couple for disk i/o, it adds up…). But definitely not hundreds. Could you do a thread dump of one of those peon processes so we can see what they are? You can do that with: jstack -l [pid]

Hi Gian,

I am no longer running the config using 8 tasks because that wasn’t working. I am now running 4 with slightly tweaked settings to manage memory (config included inline below). However it is still taking up more threads than what I would expect. I have attached the dump, but generally I see:

25 threads for http connections (I reduced this from 50)

18 CompilerThreads

128 HttpClient-Netty-Worker threads

44 G1 Concurrent Refinement

54 Gang Worker threads

A bunch of other misc threads for a bunch of different purposes

So a lot of these are from java itself.

Here is the ioConfig and tuningConfig from my supervisor spec:

“ioConfig” : {

“topic” : “raw_shopkick_pylons_weblog_avro_v1”,

“useEarliestOffset” : true,

“consumerProperties”: {

“bootstrap.servers”: “kafka001:9092,kafka002:9092,kafka003:9092”,

“max.partition.fetch.bytes”: 209715200

},

“taskCount” : 4,

“replicas” : 1,

“taskDuration”: “PT1H”

},

“tuningConfig” : {

“type” : “kafka”,

“maxRowsInMemory” : 10000,

“maxRowsPerSegment” : 2000000

}

Thanks for the help.

–Ben

peon_threads.tar.gz (12 KB)

The 128 HttpClient-Netty-Worker threads are being spawned by the formula (numCores * 2) which unfortunately isn’t currently configurable. See: https://github.com/druid-io/druid/issues/3301