[CONTROLLER-1495] Prevent Follower from becoming Candidate when Akka Cluster reports Leader as Reachable Created: 02/Mar/16  Updated: 19/Oct/17  Resolved: 22/Aug/16

Status: Resolved
Project: controller
Component/s: clustering
Affects Version/s: Beryllium
Fix Version/s: None

Type: Improvement
Reporter: Moiz Raja Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All



 Description   

Our current Raft implementation which is as per spec relies on simple heartbeats (AppendEntries) to detect if a Leader is still around or not. If the Follower misses 'n' heartbeats (configurable) it considers the Leader to be down and becomes a Candidate. Many a times however heartbeats are simply missed because there is a major GC triggered on the Leader which stops the Leader from sending AppendEntries.

Akka Cluster which uses the Phi Accrual failure detector algorithm is smarter and can take into consideration GC delays when it comes to reporting a cluster member as Reachable/Unreachable. It makes sense to use feedback from Akka cluster to determine if a Follower should switch to Candidate.

The basic idea is that when an election timeout occurs on the Follower it should check if the Leader is Reachable as per Akka Cluster. If it is it should prevent the Follower from becoming a Candidate and at the same time reschedule the Election timeout. The new election timeout however should be 1/2 (or some other fraction) of the previous election timeouts. This is to ensure that in the case of a situation where Akka reports the Unreachable state just after an election timeout we can trigger a new election fast.



 Comments   
Comment by Tom Pantelis [ 03/Mar/16 ]

This has been on my TODO list for a while. We have to be careful though when we want to force a re-election via an explicit ElectionTimeout message, e.g. for leadership transfer. Of course we could use a separate message for that or a flag.

Comment by Tom Pantelis [ 22/Aug/16 ]

Patch https://git.opendaylight.org/gerrit/#/c/43755/

Generated at Wed Feb 07 19:55:42 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.