[CONTROLLER-1876] cluster leadership changes nodes when isolated follower is rejoined to cluster Created: 30/Nov/18  Updated: 08/Dec/18  Resolved: 08/Dec/18

Status: Resolved
Project: controller
Component/s: clustering
Affects Version/s: None
Fix Version/s: Neon, Fluorine SR2

Type: Bug Priority: High
Reporter: Jamo Luhrsen Assignee: Tom Pantelis
Resolution: Duplicate Votes: 0
Labels: csit:3node, csit:failures, csit:sporadic
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Priority: High

 Description   

after isolating a follower node, the remaining two nodes are in a majority and
healthy cluster with one leader and one follower. Sometimes, in our csit automated
tests, when the isolated follower is rejoined the leader moves to a new node.

the expectation is that the leadership should not change in this case. This is a
sporadic failure in our controller csit jobs.

We need to understand why this change is happening and determine if it's
acceptable and/or expected. If so, we need to change the tests to account for
this case. It does feel inefficient for this to happen, and it may be some
condition we can avoid with a fix somewhere on the controller side.

example robot log

the karaf logs are there too:

odl 1
odl 2
odl 3

looking quickly at the jenkins robot graphs, it seems this specific test case
might fail 4-5 times out of 30 tries. I think this test case is there in multiple
suites though, so possible that it happens periodically there too making it
more or frequent. either way, 4-5 times out of 30 is enough to work with for
now.



 Comments   
Comment by Tom Pantelis [ 08/Dec/18 ]

Duplicate of CONTROLLER-1865.

Generated at Wed Feb 07 19:56:40 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.