Can't get Druid + Tranquility to work (NoBrokersAvailableException/Overlord warning)

Hi,

I am trying to get Druid 0.8.0 in combination with Tranquility (Finagle service) to work, but I am stuck at the following, well known (see [0],[1], [2], [3]) error:

Unable to push events to task: index_realtime_dadsaas_2015-08-02T19:52:00.000+02:00_0_0 (status = TaskNotFound)
Caused by: com.twitter.finagle.NoBrokersAvailableException: No hosts are available for druid:firehose:wikipedia-52-0000-0000

I think the deeper problem behind this is that the overlord does throw a warning at start (also already seen here, see[4]):

INFORMATION: Binding io.druid.server.StatusResource to GuiceManagedComponentProvider with the scope “Undefined”
Aug 02, 2015 6:33:56 PM com.sun.jersey.spi.inject.Errors processErrorMessages
WARNUNG: The following warnings have been detected with resource and/or provider classes:
WARNING: Parameter 1 of type io.druid.indexing.common.actions.TaskActionHolder from public javax.ws.rs.core.Response io.druid.indexing.overlord.http.OverlordResource.doAction(io.druid.indexing.common.actions.TaskActionHolder) is not resolvable to a concrete type

My general idea was to use the wikipedia example and ingest data with my own java program via Tranquility (Finagle) into druid. I tried to make a “basic” setup for testing ingestion via Tranquility/Java, all the examples in the Druid wiki worked (they are without overlord nodes!).

I tried a lot of different configurations to get it running on Debian Jessie 7.1, but I am always stuck at these problems. My ingestion tasks are never completed but the tasks itself do not throw any error (I can see the logs on overlord page) and run forever (listed as “Running” on overlord page). Some users reported that it works when you repeated the task, but that did not help. Adding a warmuptime did also not help. Different naming of the overlord, especially “druid:overlord” and “overlord”, did not provide any help.

So I am stuck here with no idea how to get this thing running, I also tried versions down to Druid 0.7.0, same problem.

Is it something in Debian Jessie? Except java (openjdk7) there is no debian package used.

The only thought I came up with was that maybe I have not configured the firehose in a correct way. Which firehose type is needed for Tranquility?

I also know that I need the following nodes for this: Coordinator, Broker, Historical, Overlord.

Is a MiddleManager node needed? Or can it be completely replaced with the indexing task mentioned on [5] “Batch Ingestion Using the Indexing Service”.? (I tried it several times with both of them, none of them worked.)

I hope you can help me to get Druid+Tranquility up and running :wink:

I am trying to provide as much information as possible:

As Attachement:

  • Logfiles -> Logs.tar.gz

  • Configfiles -> Configs.tar.gz

  • My Java Testcode -> TranqulityTest.java

  • dpkg -l -> Dpkg.txt

  • libraries in classpath for TranquilityTest.java -> Libs.txt

Uname -a:

Linux vagrant-debian 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1 (2015-05-24) x86_64 GNU/Linux

[0] https://groups.google.com/forum/#!topic/druid-development/PU6njY0gE5U

[1] https://groups.google.com/forum/#!topic/druid-user/UT5JNSZqAuk

[2] https://github.com/druid-io/druid/issues/1448

[3] https://groups.google.com/forum/#!topic/druid-user/LKqvur7wWmo

[4] https://groups.google.com/forum/#!topic/druid-user/1YsRnLPMkhw

[5] http://druid.io/docs/latest/ingestion/batch-ingestion.html

Logs.tar.gz (22.3 KB)

Config.tar.gz (6.9 KB)

Dpkg.txt (76.6 KB)

Libs.txt (3.82 KB)

TranquilityTest.java (4.46 KB)

Hi andreas,
Can you also check the actual task logs for any exceptions ?

Also, you seem to be not running with UTC timezone, It is recommended to run all druid nodes with UTC timezone.

Hi Nishant,

Thanks for your fast help!

I checked the task logs, they do not have any error.

You are correct, my Debian was on UTC+2. But I just copied the example code from tutorial with “-Duser.timezone=UTC”. That could be the cause I think.

However, I changed the timezone to UTC and the errors are still there.

I added new Logs as Attachement and included Task.log which contains the
output of a task (I hope that is what you meant by task logs).

Any further ideas?

NewLogs.tar.gz (28.4 KB)

Was this issue resolved. I am facing exactly the same issue on linux using the current version of druid. I see the task getting submitted on overlord console but task runs for a long time and below are the errors in tranquility.

21883 [finagle/netty3-1] WARN com.metamx.tranquility.finagle.FutureRetry$ - Transient error, will try again in 171 ms

java.io.IOException: Unable to push events to task: index_realtime_test_2015-08-26T10:00:00.000-07:00_0_0 (status = TaskRunning)

Hey Amaresh,

What version of Druid and what version of tranquility are you using? On the overlord console, are your tasks showing up pending? Or have they been assigned to a middleManager?

I’m wondering if this is a configuration problem with service discovery. The things to check are that your overlord’s “druid.service” matches the “druid.selectors.indexing.serviceName” in common.runtime.properties, and that both of those match the indexService key in tranquility. With the current versions of druid and tranquility, the defaults should work fine.

Hi again,

I am still trying to figure out what the problem is. With 0.8.1 you added “druid.indexer.runner.type=remote” in the overlord config which brought me one step further, but I still have some problems.

The problem somehow remains the same: The Tasks are now running and finishing successfully after 15-20 minutes (which seems very long !?!). However, I don’t think Ingestion has really worked, as there is still no space used. Tranquility Test script still stays “com.twitter.finagle.NoBrokersAvailableException: No hosts are available for druid:firehose:wikipedia-…”.

There are two other errors that I can see in the logs provided below:

historical.log: com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void io.druid.server.coordination.BaseZkCoordinator.start() throws java.io.IOException] on object[io.druid.server.coordination.ZkCoordinator@5700c71e].

middleManager.log: 2015-10-13T19:33:00,522 WARN [TaskMonitorCache-0] io.druid.segment.indexing.DataSchema - No metricsSpec has been specified. Are you sure this is what you want?

The last error seems to be somehow a missing ingestion spec, but I am not sure how it should be provided. In your documentation there is never mentioned how to provide it when only using ingestion with tranquility without a additional realtime node (which is not needed but can be configured to use a specific ingestion spec). Do I have to provide it somehow for the middleManager? How?

Don’t be confused about the tasklog timestamps, they seem to be somehow wrong. The tasks hang at "INFO [task-runner-0] com.metamx.http.client.pool.ChannelResourceFactory - Generating … " for about 15-20 minutes until they move on. If you just look at the timestamps, you can’t see that.

Can you somehow help me again? I am not exactly sure what the problem is or could be. It would be very nice if the documentation would provide a working Tranquility (Finagle API) example (config + code), that would help!

Zookeeper is running with default Debian ‘Jessie’ config.

As Attachement:

  • Logfiles -> logs.tar.gz

  • Configfiles -> configs.tar.gz

  • My Java Testcode -> TranqulityTest.java

  • run.bash -> How I start my Druid cluster (all on a single machine)

Best regards,
Andreas

TranquilityTest.java (4.44 KB)

config.tar.gz (2.92 KB)

logs.tar.gz (129 KB)

run.bash (865 Bytes)

I struggled a bit with this as well.

If you want something to compare against, you might have a look at the Vagrant cluster I put together:
http://brianoneill.blogspot.com/2015/09/druid-vagrant-up-and-tranquility.html

I was able to successfully run finagle against that setup.

Soon — we should have some code that we can open source that uses tranquility with that cluster config as well.

-brian

Hi Brian,

massive thx for your blog post, that should help!

I will try and report back.

Hey Andreas,

It looks like your tasks are stacking up too much while rolling over, because your windowPeriod is larger than your segmentGranularity. This causes you indexing service to run out of task slots, so tasks will end up taking too long to start, causing tranquility to complain that it can’t find them when it expects them (the “NoBrokersAvailableException”).

You should be able to fix that resourcing problem by having windowPeriod be shorter than segmentGranularity. Some reasonable setting might be windowPeriod=PT10M and segmentGranularity=HOUR. Note that queries can be served on segments as they’re getting built, so even with segmentGranularity=HOUR you will still be able to query data in real time.

The “No metricsSpec has been specified. Are you sure this is what you want?” is telling you that you have no aggregators defined, and you may want one. Generally if you aren’t going to have any others, you should at least have a “count” aggregator. You can do that in tranquility by setting aggregators to:

ImmutableList.of(new CountAggregatorFactory(“cnt”));