-
Bug
-
Resolution: Duplicate
-
High
-
None
-
High
after isolating a follower node, the remaining two nodes are in a majority and
healthy cluster with one leader and one follower. Sometimes, in our csit automated
tests, when the isolated follower is rejoined the leader moves to a new node.
the expectation is that the leadership should not change in this case. This is a
sporadic failure in our controller csit jobs.
We need to understand why this change is happening and determine if it's
acceptable and/or expected. If so, we need to change the tests to account for
this case. It does feel inefficient for this to happen, and it may be some
condition we can avoid with a fix somewhere on the controller side.
the karaf logs are there too:
looking quickly at the jenkins robot graphs, it seems this specific test case
might fail 4-5 times out of 30 tries. I think this test case is there in multiple
suites though, so possible that it happens periodically there too making it
more or frequent. either way, 4-5 times out of 30 is enough to work with for
now.