Indexing task never finishes

Hello,

My indexing tasks never finish. The indexing task points to an S3 file, and the ingestion spec contains the following job properties to be able to access the file:

"jobProperties" : {
   "fs.s3n.awsAccessKeyId" : "YOUR_ACCESS_KEY",
   "fs.s3n.awsSecretAccessKey" : "YOUR_SECRET_KEY",
   "fs.s3n.impl" : "org.apache.hadoop.fs.s3native.NativeS3FileSystem",
   "io.compression.codecs" : "org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec"
}

``

  1. On the overlord console, my task stays in the running section.

  2. When clicking on the “log (all)” or “log (last 8kb)” I get redirected to a blank page.

  3. I ssh into the Middle Manager, and tail the log at

var/druid/task/<task_id>/log

``

I see the following exceptions:

2016-04-18T01:08:14,092 INFO [main] io.druid.indexing.common.actions.RemoteTaskActionClient - Submitting action for task[index_hadoop_wikiticker_2016-04-18T01

:07:19.194Z] to overlord[http://localhost:8090/druid/indexer/v1/action]: LockTryAcquireAction{interval=2015-09-12T00:00:00.000Z/2015-09-13T00:00:00.000Z}

2016-04-18T01:08:14,092 INFO [main] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://localhost:8090

2016-04-18T01:08:14,094 WARN [HttpClient-Netty-Boss-0] org.jboss.netty.channel.SimpleChannelUpstreamHandler - EXCEPTION, please implement org.jboss.netty.hand

ler.codec.http.HttpContentDecompressor.exceptionCaught() for proper handling.

java.net.ConnectException: Connection refused: localhost/127.0.0.1:8090

    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_72-internal]

    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:1.8.0_72-internal]

    at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152) ~[netty-3.10.4.Final.jar:?]

    at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) [netty-3.10.4.Final.jar:?]

    at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) [netty-3.10.4.Final.jar:?]

    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) [netty-3.10.4.Final.jar:?]

    at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) [netty-3.10.4.Final.jar:?]

    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [netty-3.10.4.Final.jar:?]

    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [netty-3.10.4.Final.jar:?]

    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_72-internal]

    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_72-internal]

    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_72-internal]

2016-04-18T01:08:14,094 WARN [main] io.druid.indexing.common.actions.RemoteTaskActionClient - Exception submitting action for task[index_hadoop_wikiticker_201

6-04-18T01:07:19.194Z]

org.jboss.netty.channel.ChannelException: Faulty channel in resource pool

    at com.metamx.http.client.NettyHttpClient.go(NettyHttpClient.java:137) ~[http-client-1.0.4.jar:?]

    at com.metamx.http.client.AbstractHttpClient.go(AbstractHttpClient.java:14) ~[http-client-1.0.4.jar:?]

    at io.druid.indexing.common.actions.RemoteTaskActionClient.submit(RemoteTaskActionClient.java:101) [druid-indexing-service-0.9.0.jar:0.9.0]

    at io.druid.indexing.common.task.HadoopIndexTask.isReady(HadoopIndexTask.java:137) [druid-indexing-service-0.9.0.jar:0.9.0]

    at io.druid.indexing.worker.executor.ExecutorLifecycle.start(ExecutorLifecycle.java:168) [druid-indexing-service-0.9.0.jar:0.9.0]

    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_72-internal]

    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_72-internal]

    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_72-internal]

    at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_72-internal]

    at com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler.start(Lifecycle.java:350) [java-util-0.27.7.jar:?]

    at com.metamx.common.lifecycle.Lifecycle.start(Lifecycle.java:259) [java-util-0.27.7.jar:?]

    at io.druid.guice.LifecycleModule$2.start(LifecycleModule.java:155) [druid-api-0.3.16.jar:0.9.0]

    at io.druid.cli.GuiceRunnable.initLifecycle(GuiceRunnable.java:91) [druid-services-0.9.0.jar:0.9.0]

    at io.druid.cli.CliPeon.run(CliPeon.java:237) [druid-services-0.9.0.jar:0.9.0]

    at io.druid.cli.Main.main(Main.java:105) [druid-services-0.9.0.jar:0.9.0]

Caused by: java.net.ConnectException: Connection refused: localhost/127.0.0.1:8090

    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_72-internal]

    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:1.8.0_72-internal]

    at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152) ~[netty-3.10.4.Final.jar:?]

    at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) ~[netty-3.10.4.Final.jar:?]

    at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) ~[netty-3.10.4.Final.jar:?]

    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) ~[netty-3.10.4.Final.jar:?]

    at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) ~[netty-3.10.4.Final.jar:?]

    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) ~[netty-3.10.4.Final.jar:?]

    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) ~[netty-3.10.4.Final.jar:?]

    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_72-internal]

    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_72-internal]

    at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_72-internal]

``

I am running 9.0 stable.

I am using S3 for deep storage and indexing logs.

My topology is the following:

2xHistorical m4.xlarge

1xCoordinator t2.large

1xOverlord m4.xlarge

1xMiddle Manager m4.xlarge

1xZookeeper t2.large

1xBroker m4.xlarge

Postgres for Metadata

The same indexing task pointed at my the cluster running inside my local machine worked. The only difference are JVM configs and the fact that now every node has it’s own box.

Thanks in advance for any help.

Are you overlord and middleManager running on the same machine??

Have your historicals free space?? What are your druid rules?

Regards,

Andrés

Hello Andres,

The Middle Manager and Overlord are running in different boxes.

The historical nodes should have enough space since I am using S3.

Here is the configuration:

druid.service=druid/historical

druid.port=8083

HTTP server threads

druid.server.http.numThreads=25

Processing threads and buffers

druid.processing.buffer.sizeBytes=536870912

druid.processing.numThreads=7

Segment storage

druid.segmentCache.locations=[{“path”:“var/druid/segment-cache”,“maxSize”:130000000000}]

druid.server.maxSize=130000000000

``

I haven’t set any rules. I just booted up this cluster.

Hi Carlos,
Looks like you might have set druid.host in your overlord runtime.properties to localhost which leads to the task not being able to talk to it.

Can you try removing druid.host so that it can pick up the public ip or set it manually to the public ip of the machine on which overlord is running. ?

Thank you Nishant.

You were right, I was missing the “druid.host” property from my broker and middle manager. Once I set those to the ip addresses of each hosting machine, my indexing task was successful.