Issues with Tranquility and Indexing Service

Hello all,

I am new to using Druid and I am having a few issues using Tranquility to write to my druid cluster. I am consuming a kafka stream within Tranquility and publishing to druid seems to work. For now, I am using a schemaless Dimension just to get things rolling, however, my tasks stay ‘pending’. My overlord runner type is remote with a middlemanager with the respective runtime.properties:

overlord:

druid.host=

druid.port=

druid.service=druid/overlord

druid.indexer.queue.startDelay=PT0M

druid.indexer.runner.javaOpts="-server -Xmx256m"

druid.indexer.fork.property.druid.processing.numThreads=1

druid.indexer.fork.property.druid.computation.buffer.size=100000000

druid.indexer.logs.type=s3

druid.indexer.logs.s3Bucket=bucket

druid.indexer.logs.s3Prefix=druid/tasks/logs

druid.indexer.runner.minWorkerVersion=1

druid.indexer.runner.startPort=

druid.indexer.runner.type=remote

druid.indexer.storage.type=metadata

middlemanager:

druid.host=

druid.port=

druid.service=middlemanager

druid.indexer.logs.type=s3

druid.indexer.logs.s3Prefix=druid/tasks/logs

druid.indexer.logs.s3Bucket=bucket

druid.indexer.task.baseTaskDir=/mnt/persistent/task/

druid.indexer.runner.javaOpts=-server -Xmx3g -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps

druid.worker.ip=

druid.worker.capacity=5

Tranquility is also throwing these Transient Exceptions after the first task is thrown in pending:

Transient error, will try again in 179 ms

java.io.IOException: Unable to push events to task: index_realtime_2015-09-02T11:11:00.000-04:00_0_0 (status = TaskRunning)

Caused by: com.twitter.finagle.NoBrokersAvailableException: No hosts are available for druid:firehose:Omniture-11-0000-0000

Does this have something to do with my middlemanager? I tried starting my overlord with local configs and tasks actually successfully complete. I also can’t even the the task logs from the coordinator console with my s3 configs.

Any and all help is appreciated.

Thanks,

Hey Nicholas, those exceptions (“Unable to push events to task” with “status = TaskRunning”) are directly related to the task being pending. Tranquility is waiting for the task to show up so it can push events to it.

Have you set druid.selectors.indexing.serviceName in your common.runtime.properties? If not then it should be set to “druid/overlord” (to match your overlord service name).

If checking that doesn’t help, can you attach logs from your overlord and your middleManager?

Hey Gian,

Thanks for helping!

I changed my indexer.serviceName in my common.runtime.properties from ‘druid:overlord’ to *'druid/overlord’. *That helped and it looks like Tranquility is sending events to that tasks that are running (until it keeps throwing that same ‘nohosts’ exception again). This is all happening with a **local **indexing runner type. When I change this to remote, unfortunately the tasks stay pending and Tranquility keeps throwing that exception. I’ve attached the logs for middlemanager and overlord. Overlord really only complains about my default worker strategy. logs go until i shut the services down.

Again, thanks for your help man.

middlemanager.log (262 KB)

overlord.log (127 KB)

My Coordinator console is also showing that tasks keep getting created and thrown into pending status. Why is this? Is my overlord not properly communicating with my middlemanager?

snippet from log. what is duration -1? does this just mean it hasnt started running yet?

Duration -1 just means it hasn’t finished yet so the duration is not yet known.

It looks like your overlord and middleManager aren’t able to communicate. Is the middleManager showing up as a “worker” on the overlord web console? (I would guess no). These docs might be helpful to get things configured properly:

http://druid.io/docs/latest/configuration/indexing-service.html

http://druid.io/docs/latest/configuration/production-cluster.html

The “production cluster” configs in particular might be useful since they are a good working example of a distributed (overlord + middleManager) cluster config.

Gian! Thanks for replying.

So i was using a custom docker setup to get everything running in a distributed cluster of containers. somewhere i think that setup fudged with my druid configs. now my indexing service is not using docker at all anymore and it looks like the production configurations you linked are working. i think. My middle manager is throwing this exception for every task and then completes them with a FAILED status:

2015-09-10T14:41:12,507 ERROR [WorkerTaskMonitor-0] io.druid.indexing.worker.WorkerTaskMonitor - I can’t build there. Failed to run task: {class=io.druid.indexing.worker.WorkerTaskMonitor, exceptionType=class java.util.concurrent.ExecutionException, exceptionMessage=java.lang.RuntimeException: org.jets3t.service.ServiceException: Service Error Message. – ResponseCode: 403, ResponseStatus: Forbidden, XML Error Message: <?xml version="1.0" encoding="UTF-8"?>AccessDeniedAccess Denied420D907D844C4AE9ZoqQ5ZGwXYnK1zTWjpB1I8thA4pj+dwCXLP8zSXfcuGkNo/aDf4cXG7bjREUZA0+, task=index_realtime_Omniture_2015-09-10T10:38:00.000-04:00_0_0}

java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.jets3t.service.ServiceException: Service Error Message. – ResponseCode: 403, ResponseStatus: Forbidden, XML Error Message: <?xml version="1.0" encoding="UTF-8"?>AccessDeniedAccess Denied420D907D844C4AE9ZoqQ5ZGwXYnK1zTWjpB1I8thA4pj+dwCXLP8zSXfcuGkNo/aDf4cXG7bjREUZA0+

at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:1.8.0_51]

at java.util.concurrent.FutureTask.get(FutureTask.java:192) ~[?:1.8.0_51]

at io.druid.indexing.worker.WorkerTaskMonitor$1$1.run(WorkerTaskMonitor.java:131) [druid-indexing-service-0.7.0.jar:0.7.0]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_51]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_51]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_51]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_51]

at java.lang.Thread.run(Thread.java:745) [?:1.8.0_51]

Is this a folder access issue? The exception is not clear where my middlemanager needs access too.

Hey Nicholas, that’s an S3 access error, and it’s probably happening because either your deep storage or task log archiving config is set to a bucket that you don’t have access to. If you do want to use S3 for deep storage and task log archiving, you can adjust which bucket will be used.

From the sample production configs, the things to edit are,

Your IAM keys:

druid.s3.accessKey

druid.s3.secretKey

Deep storage related configs:

druid.indexer.fork.property.druid.storage.type

druid.indexer.fork.property.druid.storage.bucket

druid.indexer.fork.property.druid.storage.baseKey

druid.indexer.fork.property.druid.storage.archiveBucket (if you’re using the ArchiveTask)

druid.indexer.fork.property.druid.storage.archiveBaseKey (if you’re using the ArchiveTask)

Task log archiving related configs:

druid.indexer.logs.type

druid.indexer.logs.s3Bucket

druid.indexer.logs.s3Prefix