MiddleManager and Peon start "Running shutdown hook" and I cannot find out why

The middle manager will run a few segments successfully but then got shut down.
I’m using Spark with Tranquility library to write data to Druid cluster. A side question: I don’t have to deploy realtime node nor Tranquility server, right? What exactly is realtime node? In my setup, are realtime nodes actually Peon processes that created by MiddleManager?

In MiddleManager’s log I find:

2018-01-16T20:25:35,911 INFO [Thread-26] io.druid.java.util.common.lifecycle.Lifecycle - Running shutdown hook

There’s no exception/failure before that. Then there will be a bunch of “unannouncing”, “Invoking stop method”, “Unregistered listener” and soon the process died.

In the task log I find similar:

2018-01-16T20:25:35,911 INFO [Thread-27] io.druid.cli.CliPeon - Running shutdown hook

Then a couple of shutdown logs afterwards. Towards the end, it got java.lang.RuntimeException: java.lang.InterruptedException

MiddleManager runtime.properties:

druid.service=druid/middleManager

druid.port=8091

Number of tasks per middleManager

druid.worker.capacity=8

Task launch parameters

druid.indexer.runner.javaOpts=-server -Xmx4g -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

druid.indexer.task.baseTaskDir=var/druid/task

HTTP server threads

druid.server.http.numThreads=25

Processing threads and buffers on Peons

#druid.indexer.fork.property.druid.processing.buffer.sizeBytes=268435456

druid.indexer.fork.property.druid.processing.buffer.sizeBytes=536870912

druid.indexer.fork.property.druid.processing.numThreads=2

Hadoop indexing

druid.indexer.task.hadoopWorkingPath=var/druid/hadoop-tmp

druid.indexer.task.defaultHadoopCoordinates=[“org.apache.hadoop:hadoop-client:2.7.3”]

MiddleManager jvm.config

-server

-Xms4g

-Xmx4g

-XX:MaxDirectMemorySize=4g

-XX:+UseConcMarkSweepGC

-XX:+PrintGCDetails

-XX:+PrintGCTimeStamps

-Duser.timezone=UTC

-Dfile.encoding=UTF-8

-Djava.io.tmpdir=var/tmp

-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

The traffic is not huge, it’s about 25k message/second. I tried filtered out 90% traffic before sending to Druid but it didn’t help.

I have 1 machine dedicated to middle manager: 16 vCPUs, 104 GB memory. I do see CPU usage climbe to 30-40% when the tasks are running.

I tried increasing segment partition from 1 to 2 or 4 and it didn’t seem to help. I checked the segment size of those successful tasks which are about 10MB.

What’s the next thing I should check?

I’m experiencing a similar issue.

I’m running a test cluster with 3 nodes (broker, overlord & coordinator, historical & middle-manager). It’s running OK for a few hours (i’m doing a batch ingestion once) and then i see a shutdown hook running without any errors (no signs on the syslog too).

2018-08-22T02:21:53,154 INFO [NodeTypeWatcher[overlord]] io.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider$NodeTypeWatcher - Node[overlord:DiscoveryDruidNode{druidNode=DruidNode{serviceName=‘druid/overlord’, host=‘ip----.us-***-1.compute.internal’, po

rt=-1, plaintextPort=8090, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeType=‘overlord’, services={}}] disappeared.

2018-08-22T02:21:53,154 INFO [Thread-50] io.druid.java.util.common.lifecycle.Lifecycle - Running shutdown hook

2018-08-22T02:21:53,158 INFO [Thread-50] io.druid.curator.discovery.CuratorDruidNodeAnnouncer - Unannouncing [DiscoveryDruidNode{druidNode=DruidNode{serviceName=‘druid/coordinator’, host=‘ip----.us-***-1.compute.internal’, port=-1, plaintextPort=8081, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeType=‘coordinator’, services={}}].

2018-08-22T02:21:53,158 INFO [Thread-50] io.druid.curator.announcement.Announcer - unannouncing [/mnt/druid/zk/internal-discovery/coordinator/ip----.us-***-1.compute.internal:8081]

2018-08-22T02:21:53,170 INFO [Thread-50] io.druid.curator.discovery.CuratorDruidNodeAnnouncer - Unannounced [DiscoveryDruidNode{druidNode=DruidNode{serviceName=‘druid/coordinator’, host=‘ip----.us-***-1.compute.internal’, port=-1, plaintextPort=8081, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeType=‘coordinator’, services={}}].

2018-08-22T02:21:53,170 INFO [Thread-50] io.druid.server.initialization.jetty.JettyServerModule - Stopping Jetty Server…

2018-08-22T02:21:53,173 INFO [Thread-50] org.eclipse.jetty.server.AbstractConnector - Stopped ServerConnector@7a04f730{HTTP/1.1,[http/1.1]}{0.0.0.0:8081}

2018-08-22T02:21:53,180 INFO [Thread-50] org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.s.ServletContextHandler@11c88cca{/,[jar:file:/home/ubuntu/druid-0.12.1/lib/druid-console-0.0.2.jar!/io/druid/console, jar:file:/home/ubuntu/druid-0.12.1/lib/druid-server-0.12.1.jar!/static],UNAVAILABLE}

Did you solve your issue? others with idea about it?

Any ideas about it? need to start the druid processes every day…

I am facing same issue. Can you please share the details how can I overcome from this issue.

Anything in the overlord log? (If you have the task ID, you can grep for that in the overlord log.)
I don’t have an answer, but I would probably check the overlord log next.