realtime node replication

Hi,

I’m running a replication test with two realtime nodes that are setup with the same spec file for replication. I don’t use tranquility or kafka but use the receiver firehose and update the data via post requests to the individual realtime nodes.

In my test the broker seems to have favoured realtime node1 for querying. When i shut down node1 the broker used node2 from there on - all good. But when node1 came back online it used node1 again for queries - even if we had data updates in the meantime that made it to the segments in node2.

Is it possible to mark the segments of node1 as dirty if the node goes down for a while so the broker would pick up the more accurate node2 in this example and only uses node1 again if node2 would disappear as well?

Cheers

Thorsten

Or in general - how does the broker query and later the cold storage handoff work in realtime nodes when they are setup for replication (via same spec file)?
Which segment for the same granularity will be chosen from one of the replicated realtime nodes?

Hi, see inline.

Hi,

I’m running a replication test with two realtime nodes that are setup with the same spec file for replication. I don’t use tranquility or kafka but use the receiver firehose and update the data via post requests to the individual realtime nodes.

In my test the broker seems to have favoured realtime node1 for querying. When i shut down node1 the broker used node2 from there on - all good. But when node1 came back online it used node1 again for queries - even if we had data updates in the meantime that made it to the segments in node2.

Hmm, by default the broker should be randomly selecting one of the two nodes to query. You can change this behaviour with druid.broker.balancer.type: http://druid.io/docs/latest/configuration/broker.html

Is it possible to mark the segments of node1 as dirty if the node goes down for a while so the broker would pick up the more accurate node2 in this example and only uses node1 again if node2 would disappear as well?

You can write a custom server selector strategy (https://github.com/druid-io/druid/blob/20fdb627d99b3cf0d29c85270f801868d8c4d7c8/server/src/main/java/io/druid/client/selector/ServerSelectorStrategy.java) to tell the broker which nodes to favor, however, you can also look at tranquility based ingestion, where a task will fail as soon as a node doing realtime ingestion fails, so queries always go to the most up to date node.

Hi Fangjin,

thanks for the reply and information. Further tests showed the random selection behaviour of the broker.

So does the druid.broker.balancer.type property sets the behaviour for historical nodes queries only (like mentioned in the docs) or as well for querying realtime nodes?

If so I’ll definitely have a look at the SeverSelectorStrategy.

We are running a c/c++, python and node.js based stack so I’d like to avoid using the tranquility lib and java/scala for now if possible.

Cheers

Thorsten

The balancer type include both historicals and realtimes.