Tranquility: multiple instances

Hi there!

I’m running 3 servers with the next config for each of them:

  • Historical + MiddleManager + Tranquility + Kafka
    Brokers, Coordinators and Overlords run in other servers.

I have one topic with 3 partitions, one for each Kafka broker. I’m sending data to the three brokers, and so far everything works just fine.

The problem is that I run the 3 Tranquility instances and always only one of them receives the data and put it into druid. I copied exactly the same json for the task in the 3 servers, and I set these properties:

“task.partitions” : “1”**“task.replicants” : “1”,

I already tried to change them to 3 in every possible combination. What I see in the tranquility logs, is that the one that is receiving the data outputs this:

[ConsumerFetcherManager-1469704934225] Added fetcher for partitions ArrayBuffer([[pageviews,1], initOffset 25687 to broker id:2,host:host2,port:9092] , [[pageviews,0], initOffset 25685 to broker id:1,host:host1,port:9092] , [[pageviews,2], initOffset 25686 to broker id:3,host:host3,port:9092]

It handles all the partitions…

Is there anything that I’m missing? Should I change something in Kafka?

Thank you.

By the way, I’m using Druid 0.9.1.1, Tranquility 0.8.2 and Kafka 0.10.

Hi all,

if someone can give us a hand, it would be nice. We still cannot parallelize tranquility’s work.

Thank you!

Hi there,

any hint? We have three tranquility running but only one of them is getting Kafka messages…

Thanks!

Hi Fede,

We had a similar issue at first (we were trying to parallelize the Tranquility Server) and the thing that worked for us was to put our Tranquility instances behind a Load Balancer.

We are not using Kafka, but hitting Tranquility Server directly via an HTTP POST request so maybe it is not the same case as yours.

For our solution, you can refer to this thread:
https://groups.google.com/forum/#!topic/druid-user/90BMCxz22Ko

Hi pit_theo. We’re actually using Kafka, so I think this solution wouldn’t solve our issue.

But thank you very much eitherway.

Hmm, Tranquility Kafka uses the high level Kafka consumer, which should balance partitions automatically if you start up multiple instances. (assuming you have enough partitions, which it looks like you do.) Is it the same instance every time that gets all the partitions? Or if you stop that one, will another one then get them all?

Are you passing in a key while publishing messages to kafka?If all your messages are coming only to just few partitions in kafka some of your consumer will be sitting idle i guess.

Hi Gian, that is exactly what happens, if I stop the one that is getting all, one of the other two instances gets all…

I have 3 partitions with replication 2 in Kafka, and they are well distributed along the servers. With kafka --describe I get:

By the way, if I start the one that was getting all again, it gets all another time, is like the master.

Hi pp, I’m not sending a key, just the message. May this affect?

Hi Fede, did you answer Gian’s questions? Tranquility Kafka is just acting as a high level Kafka consumer.

Alternatively, we’ve recently added exactly-once consumption from Kafka via the new Kafka indexing task. You may be interested to try that out as well.

Hi Fangjin, yes we are planning to change to the Kafka Indexing service, but we haven’t yet.

What do you mean that tranquility is acting as a high level Kafka consumer? Is something wrongly configured?

Thanks!

Hey Fede,

What FJ means is that Tranquility is not actually doing anything special to control which process gets which Kafka partition. It just runs a high level Kafka consumer, and that library makes the decisions. If it’s not balancing well you might have better luck with the Kafka mailing list. Perhaps adding more partitions would help?

Hello, Fede.
We’ve experenced the same problem with our setup. In our case it was kafka-write trouble.

At first check partitions offset by bin/kafka-run-``class``.sh kafka.tools.ConsumerOffsetChecker. If you didn’t specify partition key (as we did), kafka by default write all data in one partition for 10 minutes. So try to set partiton key, row hash good choice for start.