Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Fix Version/s: None
Affects Version/s: None
Component/s: clustering
Labels:
None
Environment:

Operating System: All
Platform: All

External issue ID:
8524

This manifests as a CSIT failure [0]. The UnreachableMember is another issue (CONTROLLER-1645 for example). It is possible that cluster members end up with an inconsistent shard configuration.

Karaf.log on member-1 [1] shows the replica removal started at 01:55:43,472, then this happened:
2017-05-22 01:56:06,569 | WARN | lt-dispatcher-32 | aftActorLeadershipTransferCohort | 193 - org.opendaylight.controller.sal-akka-raft - 1.5.0.Carbon | member-1-shard-default-config: Failed to transfer leadership in 10.01 s
2017-05-22 01:56:06,572 | INFO | lt-dispatcher-22 | Shard | 192 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.Carbon | Stopping Shard member-1-shard-default-config

Finally, the test teardown started adding the replica back at 01:56:28,959.
Thus even though the test was waiting 45 seconds, members only have 20 seconds to realize the previous leader is gone (we can add more time to the test if needed).

As member-3 karaf.log [2] shows no activity between 01:56:03,165 and 01:56:56,244 it looks like member-1 was perhaps somehow still a leader, but "has no leader" response [3] from member-1 when adding the replica back proves there really was no leader, at least from member-1 point of view.

Every member shows multiple UnreachableMember messages. Not sure if the subsequent ones are the cause or the result of missing the leader.

[0] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/715/archives/log.html.gz#s1-s36-t1-k2-k13-k1-k3-k1
[1] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/715/archives/odl1_karaf.log.gz
[2] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/715/archives/odl3_karaf.log.gz
[3] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/715/archives/log.html.gz#s1-s36-t1-k2-k14-k2-k3-k1-k4-k7-k1

is blocked by

CONTROLLER-1706 Large transaction traffic prevents leader to be moved

Resolved

Assignee:: Unassigned

Reporter:: Vratko Polak

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 22/May/17 8:47 AM

Updated:: 25/Jul/23 8:24 AM

Resolved:: 18/Sep/17 12:12 PM

Details

Description

Attachments

Issue Links

Activity

People

Dates