-
Bug
-
Resolution: Done
-
None
-
None
-
None
-
Operating System: All
Platform: All
-
8524
This manifests as a CSIT failure [0]. The UnreachableMember is another issue (CONTROLLER-1645 for example). It is possible that cluster members end up with an inconsistent shard configuration.
Karaf.log on member-1 [1] shows the replica removal started at 01:55:43,472, then this happened:
2017-05-22 01:56:06,569 | WARN | lt-dispatcher-32 | aftActorLeadershipTransferCohort | 193 - org.opendaylight.controller.sal-akka-raft - 1.5.0.Carbon | member-1-shard-default-config: Failed to transfer leadership in 10.01 s
2017-05-22 01:56:06,572 | INFO | lt-dispatcher-22 | Shard | 192 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.Carbon | Stopping Shard member-1-shard-default-config
Finally, the test teardown started adding the replica back at 01:56:28,959.
Thus even though the test was waiting 45 seconds, members only have 20 seconds to realize the previous leader is gone (we can add more time to the test if needed).
As member-3 karaf.log [2] shows no activity between 01:56:03,165 and 01:56:56,244 it looks like member-1 was perhaps somehow still a leader, but "has no leader" response [3] from member-1 when adding the replica back proves there really was no leader, at least from member-1 point of view.
Every member shows multiple UnreachableMember messages. Not sure if the subsequent ones are the cause or the result of missing the leader.
[0] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/715/archives/log.html.gz#s1-s36-t1-k2-k13-k1-k3-k1
[1] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/715/archives/odl1_karaf.log.gz
[2] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/715/archives/odl3_karaf.log.gz
[3] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/715/archives/log.html.gz#s1-s36-t1-k2-k14-k2-k3-k1-k4-k7-k1
- is blocked by
-
CONTROLLER-1706 Large transaction traffic prevents leader to be moved
- Resolved