Successful indexing task remains "RUNNING"

Hello,

Indexing tasks that apparently finish successfully do not close down properly. I still see the Java process and the task shows up as running in the coordinator console once indexing is complete. I ingest files into Druid by submitting tasks to a local indexer. The final message in the task log is:

2016-04-07T19:13:22,316 INFO [task-runner-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
“id” : “index__2016-04-07T19:08:21.399Z”,
“status” : “SUCCESS”,
“duration” : 293390
}
2016-04-07T19:13:22,321 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.server.coordination.AbstractDataSegmentAnnouncer.stop()] on object[io.druid.server.coordination.BatchDataSegmentAnnouncer@5dc98c7c].
2016-04-07T19:13:22,322 INFO [main] io.druid.server.coordination.AbstractDataSegmentAnnouncer - Stopping class io.druid.server.coordination.BatchDataSegmentAnnouncer with config[io.druid.server.initialization.ZkPathsConfig@58d3f4be]
2016-04-07T19:13:22,322 INFO [main] io.druid.curator.announcement.Announcer - unannouncing [/druid/announcements/:8100]

What can I do to shut things down properly?

Thanks,

/David

Hi David,
Task vm’s should exit after a task completes.

which version of druid are you using ?

Can you try updating to the latest druid version if that fixes your issues ?

In case you are seeing it with latest version, can you take a thread dump and share that for further diagnosis ?

Hello Nishant,

Thanks for your reply. I provide links to 3 logs:

  • The indexer task log.
  • The overlord log.
  • The thread dump.
    Druid version is 0.8.3 - clean slate, no previous datasources, nothing.

Thank you,

/David

Hi,
Thanks for the info,

I looked at the thread dump, It seems that its being stuck at unannouncing the segment and is waiting on the zookeeper commit.

I suspect its due to running on an old version of zookeeper.

Which version of zookeeper are you using ?

Can you try updating your zookeeper to v3.4.6 in case you are running an older version ?

I had the same thought - we are running zookeeper 3.3.5 :-/ I updated to 3.4.8 and I can now successfully query the data. I do see some shutdown related errors towards the bottom of the indexer log file, see here.

Thanks for the hint!

/David

Ok, so as indicated in my previous post I got a task to complete. I tried with more data (~8GB) while setting overload runtime properties druid.indexer.runner.javaOpts=-server -Xmx8g, towards the end of the indexing task the indexing task failed with the message:

11T16:14:08.269Z/work/__REPLACED___clean_hits_2016-02-09T00:00:00.000Z_2016-02-10T00:00:00.000Z_2016-04-11T16:14:08.277Z_0/merged/v8-tmp] completed inverted.drd in 8,811 millis.

2016-04-11T18:06:53,517 ERROR [task-runner-0] com.metamx.common.guava.CloseQuietly - IOException thrown while closing Closeable.

java.io.IOException: Expected [42,336,598] bytes, only saw [0], potential corruption?

at com.metamx.common.io.smoosh.FileSmoosher$1.close(FileSmoosher.java:211) ~[java-util-0.27.4.jar:?]

at com.metamx.common.guava.CloseQuietly.close(CloseQuietly.java:36) [java-util-0.27.4.jar:?]

at com.metamx.common.io.smoosh.FileSmoosher.add(FileSmoosher.java:145) [java-util-0.27.4.jar:?]

at com.metamx.common.io.smoosh.FileSmoosher.add(FileSmoosher.java:119) [java-util-0.27.4.jar:?]

at com.metamx.common.io.smoosh.FileSmoosher.add(FileSmoosher.java:114) [java-util-0.27.4.jar:?]

at com.metamx.common.io.smoosh.Smoosh.smoosh(Smoosh.java:62) [java-util-0.27.4.jar:?]

at io.druid.segment.IndexMerger.makeIndexFiles(IndexMerger.java:946) [druid-processing-0.8.3.jar:0.8.3]

at io.druid.segment.IndexMerger.merge(IndexMerger.java:358) [druid-processing-0.8.3.jar:0.8.3]

at io.druid.segment.IndexMerger.mergeQueryableIndex(IndexMerger.java:244) [druid-processing-0.8.3.jar:0.8.3]

at io.druid.segment.IndexMerger.mergeQueryableIndex(IndexMerger.java:217) [druid-processing-0.8.3.jar:0.8.3]

at io.druid.indexing.common.index.YeOldePlumberSchool$1.finishJob(YeOldePlumberSchool.java:169) [druid-indexing-service-0.8.3.jar:0.8.3]

at io.druid.indexing.common.task.IndexTask.generateSegment(IndexTask.java:399) [druid-indexing-service-0.8.3.jar:0.8.3]

at io.druid.indexing.common.task.IndexTask.run(IndexTask.java:213) [druid-indexing-service-0.8.3.jar:0.8.3]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:285) [druid-indexing-service-0.8.3.jar:0.8.3]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:265) [druid-indexing-service-0.8.3.jar:0.8.3]

at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_80]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_80]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_80]

at java.lang.Thread.run(Thread.java:745) [?:1.7.0_80]

2016-04-11T18:06:53,629 WARN [task-runner-0] io.druid.indexing.common.index.YeOldePlumberSchool - Failed to merge and upload

java.io.IOExcept

is this symptomatic of a resource issue? I could index the data in smaller chunks. This is a POC for historical data, I did not want to go to the length of re-compiling Druid with our Hadoop version etc., hence the local indexer job.

Regards,

David

I ended up splitting the indexing into several index tasks - how can I submit several tasks but only have one active task at a time? druid.indexer.queue.maxSize=1 actually prevents from submitting new tasks while a task is running.

Thanks,

David

Hi David,
For the IOException you saw with bigger dataset, please check your complete logs, i believe the exception you posted is not the root cause of failure, i wonder if there was an oome before the IOException that caused this.

I didnt got exactly why you would want only one task to run at a time, in any case, you can limit the number of tasks being run on a worker by setting druid.worker.capacity (http://druid.io/docs/latest/configuration/indexing-service.html)

Hello, thanks for the answer.

I don’t want to run just one task but I was looking for a way to configure the maximum number of concurrent tasks. “druid.worker.capacity” seems to do what I want.

Regarding the exception, it’s likely that there was an OOM exception somewhere. I don’t have the logs anymore though. What are the memory requirements of an indexing task? Does the input data have to fit into the heap I assigned to the index runner via: "
druid.indexer.runner.javaOpts=-server -Xmx8g"?

Thanks again,

/David

Hi David,
The memory requirements for indexing also vary depending on how much rollup is happening and the numbers of dimensions and metrics you are having.

It is generally recommended to keep your segment size around 5M rows.

For 8G of data, I would suggest you to start with roughly 10 shards. you may be fine with less shards too if you have pretty good rollup.