[CONTROLLER-1491] Entity Ownership Service: support graceful state handoff Created: 24/Feb/16  Updated: 25/Jul/23  Resolved: 24/Aug/17

Status: Resolved
Project: controller
Component/s: clustering
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Robert Varga Assignee: Unassigned
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issue Links:
Blocks
is blocked by MDSAL-158 EntityOwnershipService does not speci... Confirmed
External issue ID: 5420
Priority: Low

 Description   

Current entity ownership model does not support stateful failovers, as is the case in changing ownership of a OF connection.

Essentially there are multiple resources attached to a particular entity: the TCP session which dominates the ability to be a candidate and a datastore transaction chain which is attached to the master role.

When a master is becoming the slave as part of election, it needs to clean up data store state before the new master can resume operations. This effectively means that if there is a master and the election results indicate that the role needs to be moved to another candidate, we need a 'stopping' transition, which is asynchronous and completed by the outgoing master. Once the old master has stopped the new master can be elected.

In the OF case, this would mean that outgoing leader would initiate TransactionChain.close() and finish 'stopping' when the transaction chain callback indicates the shutdown procedure has completed.



 Comments   
Comment by Robert Varga [ 24/Feb/16 ]

Further analysis is required, as it seems that as long as access to EOS and DS share fate (e.g. they fail reliably at the same time), we can get away with just ensuring EOS prefers master stability and never takes away ownership from a candidate which is still alive.

If that is the case, it would be sufficient to expose APIs to define entity types such that each entity type has a well-known and application-controlled selection strategy.

Comment by Robert Varga [ 24/Aug/17 ]

Cluster Singleton Service has a solution for this problem.

Generated at Wed Feb 07 19:55:41 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.