Kafka indexing service (KIS) - Shutdown failed error keeps repeating in a log file

Hi,

we use KIS to read data from Kafka.

Quite likely Kafka was down and task

couldn’t read the data.

Following couple of messages keep repeating in a log file (attached):

Sent shutdown message to worker: andruid1.dev:8091, status 200 OK

Shutdown failed for index_kafka_supervisor-flink-02_539014b40b0359d_dmlalanb! Are you sure the task was running?

I wonder how I can ged rid of this message

  • e.g. clear the task from a queue.

  • how can I find out what happened with the task.

Tx.

Vlad

shutdown_failed_error.log (35 KB)

Hey Vlad,

To clean up the task queue, I would try removing the entries from the druid_tasks and druid_tasklocks table in the metadata storage.

To find out what happened with the task, you can click on the log link next to the task in the overlord console (by default http://{OVERLORD_IP}:8090) or directly at:

http://{OVERLORD_IP}:8090/druid/indexer/v1/task/index_kafka_supervisor-flink-02_539014b40b0359d_dmlalanb/log

If you could post the log from the task, that would be super helpful, as I’ve been unable to reproduce the exact issue and am curious to know why the process isn’t being killed.

By the way, the ‘shutdown failed’ after getting a 200 OK response looks like a bug to me. We were checking for a 202 response and failing because we got a 200 back instead. I’ll put in a fix for this (but this isn’t really affecting anything other than a scary log message).

If you could post the log from the task, that would be super helpful, as I’ve been unable to reproduce the exact issue and am curious to know why the process isn’t being killed.

Tx. David. I’ll check this particular log on Monday - I hope

it’s stored on the disk otherwise, it’s already gone.

To give you more info we’re testing Druid on a small cluster with low resources -
3 servers with 4GB RAM and all nodes on each server.

We had set very low limit on JVM on middlemanager (PEM ).

Almost all task were failing. I checked some logs and they

failed on JVM out of memory or they were blank.

Vlad

Dne pátek 22. července 2016 21:47:57 UTC+2 David Lim napsal(a):

If you could post the log from the task, that would be super helpful, as I’ve been unable to reproduce the exact issue and am curious to know why the process isn’t being killed.

Just this:

Exception in thread “HttpClient-Netty-Boss-0” java.lang.OutOfMemoryError: Java heap space

at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:340)

at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)

at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)

at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

Thanks Vladimir. By the way, just in case you hadn’t found this config already, if you’re trying to do ingestion in a low memory environment, you’ll want to play with the maxRowsInMemory setting to find something that works for your data complexity. Setting this lower will cause the index to spill to disk more frequently which should help you to manage your memory usage.