[CONTROLLER-1570] Leader state changed events received in reversed order Created: 15/Dec/16 Updated: 25/Jul/23 Resolved: 22/Dec/16 |
|
| Status: | Resolved |
| Project: | controller |
| Component/s: | clustering |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Vinh Nguyen | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| Attachments: |
|
| External issue ID: | 7391 |
| Description |
|
When the ODL which is the leader of some shard is shutdown, one of the remaining ODL first receives two leader changed events, first event indicating new leader is selected and then the second event indicating the shard leader is null. The correct order of events received should be reversed. The scenario is described below: 1) 6-nodes cluster (I am not sure the fact that cluster has more than 3 nodes is causing the problem because we always test with 3-nodes cluster) ReadFailedException {message=Error executeRead ReadData for path /(urn:opendaylight:neutron?revision=2015-07-12)neutron/networks, errorList=[RpcError [message=Error executeRead ReadData for path /(urn:opendaylight:neutron?revision=2015-07-12)neutron/networks, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException: Shard member-3-shard-default-config currently has no leader. Try again later.]]}Below is the excerpt from karaf log filtered on the ‘onLeaderStateChanged’ events when ODL1 is shut down. Please note the events for ‘shard-default-config’: 2016-11-14 04:32:35,263 | INFO | ult-dispatcher-6 | ShardManager | 195 - org.opendaylight.controller.sal-distributed-datastore - 1.4.1.Boron-SR1 | shard-manager-operational: Received LeaderStateChanged message: LeaderStateChanged [memberId=member-3-shard-default-operational, leaderId=null, leaderPayloadVersion=5] |
| Comments |
| Comment by Vinh Nguyen [ 16/Dec/16 ] |
|
Attachment ODL1_karaf.zip has been added with description: karaf.log ODL1 |
| Comment by Vinh Nguyen [ 16/Dec/16 ] |
|
Attachment ODL2_karaf.zip has been added with description: karaf.log ODL2 |
| Comment by Vinh Nguyen [ 16/Dec/16 ] |
|
Attachment ODL3_karaf.zip has been added with description: karaf.log ODL3 - This ODL received leader changed events in wrong order |
| Comment by Vinh Nguyen [ 16/Dec/16 ] |
|
Attachment ODL4_karaf.zip has been added with description: karaf.log ODL4 |
| Comment by Vinh Nguyen [ 16/Dec/16 ] |
|
Attachment ODL5_karaf.zip has been added with description: karaf.log ODL5 |
| Comment by Vinh Nguyen [ 16/Dec/16 ] |
|
Attachment ODL6_karaf.zip has been added with description: karaf.log ODL6 |
| Comment by Tom Pantelis [ 21/Dec/16 ] |
|
Was ODL3/member-3 a non-voting member by any chance? |
| Comment by Tom Pantelis [ 21/Dec/16 ] |
|
Strike that. I think I know the problem. On leader transition, the current leader first sends out LeaderTransitioning events to each follower to tell them the current leader is being deposed. The followers then send out a LeaderStateChange event with a null leaderId which is picked up by the ShardManager to delay subsequent transactions activity to the shard until a new leader is elected. However it's possible the LeaderStateChange message does not reach a follower until after the leader transition occurs (eg due to dispatching delay in the caller or the network). This results in a LeaderStateChange event with a null leaderId being delivered after the LeaderStateChange with the new leaderId. I wrote a unit test that reproduces it. We need to handle LeaderTransitioning events in a CAS-like manner, ie include the leaderId with the LeaderTransitioning message and only issue the LeaderStateChange event with a null leaderId if the current leaderId matches the leaderId in the LeaderTransitioning message. (In reply to Tom Pantelis from comment #7) |
| Comment by Tom Pantelis [ 21/Dec/16 ] |