[MDSAL-231] Singleton Clustering : No notification to cluster nodes if the entity don't have any owner Created: 18/Feb/17  Updated: 09/Jan/24

Status: Confirmed
Project: mdsal
Component/s: CSS
Affects Version/s: None
Fix Version/s: 14.0.0

Type: New Feature Priority: High
Reporter: Anil Vishnoi Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issue Links:
Relates
relates to OPNFLWPLUG-1072 ContextChainHolderImpl encroaches on ... Confirmed
relates to MDSAL-465 Revise Cluster Singleton API Confirmed

 Description   

With the current implementation of singleton clustering service it providers two notification for a entity - instantiateServiceInstance() & closeServiceInstance(). So whenever an entity is unregistered, it will call the closeServiceInstance(), so that the application instance can do the cleanup. We encountered an issue with openflowplugin clustering that can't be solve using these existing two notifications. Assume a scenario, where a device is connected to only one controller, so that specific controller will register that device as an entity and get the ownership. If this controller dies, other nodes in the cluster won't get any notification about the status of that device entity , because they are not registered candidate. So all the data written by owner controller to data store will be remain there.

EntityOwnershipService do notify the non-candidate nodes if any entity don't have any owner (isOwner=false, wasOwner=false,hasOwner=false). To resolve the above mentioned issue i think singleton service should expose new notification (e.g noOwnerFound() or noOwnerElected()) that will get triggered in the scenario (isOwner=false, wasOwner=false,hasOwner=false), so that other controllers can do the required clean-up. Currently openflowplugin clustering is pretty much broken because we encountered two issue that can't be solve using singleton clustering without having a notification similar to what i mentioned above.

We are planning to explore on using EOS ownership change listener with singleton clustering service to resolve these issues, but if that doesn't work, this bug is pretty much a blocker bug for us.

Please let me know if you need more details to clearly understand the issue.



 Comments   
Comment by Robert Varga [ 20/Feb/17 ]

I think this needs a more thorough analysis as to what is the problem and why, along with failure modes.

The description leads me to believe that stale data during a partition is a problem – with the implicit assumption that the owner node is in the minority partition and does not sever the connection to the switch (and hence the switch does not connect to another node).

We are past API freeze and this is a very sensitive change, so the mechanics need to be considered very carefully.

Comment by Anil Vishnoi [ 20/Feb/17 ]

(In reply to Robert Varga from comment #1)
> I think this needs a more thorough analysis as to what is the problem and
> why, along with failure modes.
>
> The description leads me to believe that stale data during a partition is a
> problem – with the implicit assumption that the owner node is in the
> minority partition and does not sever the connection to the switch (and
> hence the switch does not connect to another node).
>
I think at the high level the problem is that singleton clustering service don't notify candidate/non-candidate nodes if any entity has no-owner (irrespective of whether other nodes registered their candidature for that specific entity). Without this notification, as of now we are encountering following two issue

Issue 1:
(1) Connect switch to only one controller in the cluster
(2) Kill the controller
(3) Stale data in the data store

Issue 2:

(1) Connect switch to all the controller in the cluster
(2) Disconnect the switch from it current owner controller
(3) It will trigger calling closeServiceInstance() on the current owner controller
(4) Given that service instance is closed, it removes the data from data store
(5) That generate data change notification for listener application, and application interprets as a switch disconnected from the controller, but in reality switch is still connected to other cluster nodes.
(6) Meanwhile Other owner is elected and instantiateServiceInstance() is called, and that instance put the switch data back to data store
(7) 6th step generate another data change notification with switch data added,which actually interpreted as switch connection to the controller, which is wrong, because switch was already connected.

All this is happening because individual instance do the clean-up because it can't rely on other nodes to do the clean-up because it's not sure if other nodes registered themselves as a candidate or not.

> We are past API freeze and this is a very sensitive change, so the mechanics
> need to be considered very carefully.
You mean adding this notification in the singleton clustering service?

Comment by Robert Varga [ 27/Feb/17 ]

A pure notification scheme will not suffice, because the two scenarios are really quite different and transitions between them require synchronization.

At the end of the day, what happens to the data store is up to OFP. From the description it is clear that the data subtree is owned either:

  • a 'live' instance connected to the switch, updating the data
  • a 'dead' instance, making sure the data is erased

Note that at any given time, even in a partition, the data is logically owned by an active component, which performs an action on the data store – which is non-atomic and hence requires start/stop transitions.

The set of nodes which are eligible for registering an instance is different:

  • live instances are registered only by nodes which have the switch connected
  • dead instances are registered by all nodes

Live instances are always preferred before dead ones, which implies a simple feature addition: the notion of instance priority, which governs selection of ownership. Higher-priority instances are always preferred to lower-priority instances in a cluster.

A high priority instance is registered when a switch is connected. When a switch disconnects it does not delete data. Therefore an inter-node switch failover will not incur a deletion flap (just a pure overwrite once the initial sync is done).

A low priority instance is registered on every node when a switch connects (for example as reported by DTCL). When a switch disappears, the instance is unregistered. When this instance becomes active, it will just delete the data tree and shut down. This will mean that as long as the data is in the data store, there exists at least one instance low-priority instance.

Is there a scenario which would not be addressed by this scheme?

Comment by Ryan Goulding [ 07/Mar/17 ]

Anil, do you have an update for this? We are blocked until we receive feedback. Thanks!

Comment by Tom Pantelis [ 08/Mar/17 ]

(In reply to Robert Varga from comment #3)
> A pure notification scheme will not suffice, because the two scenarios are
> really quite different and transitions between them require synchronization.
>
> At the end of the day, what happens to the data store is up to OFP. From the
> description it is clear that the data subtree is owned either:
> - a 'live' instance connected to the switch, updating the data
> - a 'dead' instance, making sure the data is erased
>
> Note that at any given time, even in a partition, the data is logically
> owned by an active component, which performs an action on the data store –
> which is non-atomic and hence requires start/stop transitions.
>
> The set of nodes which are eligible for registering an instance is different:
> - live instances are registered only by nodes which have the switch connected
> - dead instances are registered by all nodes
>
> Live instances are always preferred before dead ones, which implies a simple
> feature addition: the notion of instance priority, which governs selection
> of ownership. Higher-priority instances are always preferred to
> lower-priority instances in a cluster.
>
> A high priority instance is registered when a switch is connected. When a
> switch disconnects it does not delete data. Therefore an inter-node switch
> failover will not incur a deletion flap (just a pure overwrite once the
> initial sync is done).
>
> A low priority instance is registered on every node when a switch connects
> (for example as reported by DTCL). When a switch disappears, the instance is
> unregistered. When this instance becomes active, it will just delete the
> data tree and shut down. This will mean that as long as the data is in the
> data store, there exists at least one instance low-priority instance.
>
> Is there a scenario which would not be addressed by this scheme?

This seems like a reasonable approach. We would need a priority in both EOS candidate and singleton service registrations. In EOS, if a candidate is registered with a higher priority than the current owner, then the new candidate should become the new owner. I think this logic would be implemented in the default selection strategy. We'd have to keep track of each candidate's priority in the data model.

Comment by Anil Vishnoi [ 10/Mar/17 ]

(In reply to Robert Varga from comment #3)
> A pure notification scheme will not suffice, because the two scenarios are
> really quite different and transitions between them require synchronization.

In my opinion, root cause of both of these issue is same - non-candidate instances don't know about the entity that is owner-less. That's the reason active instance is forced to clean-up the data.
>
> At the end of the day, what happens to the data store is up to OFP. From the
> description it is clear that the data subtree is owned either:
> - a 'live' instance connected to the switch, updating the data
> - a 'dead' instance, making sure the data is erased

if the device is connected to only one controller and the moment it gets disconnected, all the nodes in the cluster are basically dead-node, isn't it? or am i confuse with the definition of the live and dead.
>
> Note that at any given time, even in a partition, the data is logically
> owned by an active component, which performs an action on the data store –
> which is non-atomic and hence requires start/stop transitions.
Agree
>
> The set of nodes which are eligible for registering an instance is different:
> - live instances are registered only by nodes which have the switch connected
> - dead instances are registered by all nodes
Dead instance registration is implicit registration that clustering service does, or explicit registration?
>
> Live instances are always preferred before dead ones, which implies a simple
> feature addition: the notion of instance priority, which governs selection
> of ownership. Higher-priority instances are always preferred to
> lower-priority instances in a cluster.
To do what? To notify stopServiceInstance?
>
> A high priority instance is registered when a switch is connected.When a
> switch disconnects it does not delete data. Therefore an inter-node switch
> failover will not incur a deletion flap (just a pure overwrite once the
> initial sync is done).
Understood ( i assume high priority instance == live instance and low priority instance==dead instance)
>
> A low priority instance is registered on every node when a switch connects
> (for example as reported by DTCL).
In case of openflowplugin, registration happen when the switch connects to the controller (no data store in the picture at this point), so with this approach, openflowplugin will have to listen on the data store to register for the low priority instance? If not, then plugin will have no way to determine if it needs to register for the low priority instance. But if it needs to rely on the data store notification for registration, it can lead to the stale data situation, in a scenario where switch gets disconnected immediately after the writing the related data in the data store.

> When a switch disappears, the instance is
> unregistered. When this instance becomes active, it will just delete the
> data tree and shut down.
By active you mean, clustering service callback some notification to the low priority instances ? Will clustering service choose one of the low priority instance or it will notify both the low priority instances ?
> This will mean that as long as the data is in the
> data store, there exists at least one instance low-priority instance.
>
> Is there a scenario which would not be addressed by this scheme?

Comment by Tom Pantelis [ 10/Mar/17 ]

I think what Robert is saying is that there would be 2 registration types for a switch's singleton service, "live" and "dead". The "live" one is registered on switch connect and becomes eligible to manage the switch. This is what it does now. The "dead" one is registered by every node in the cluster and is responsible for cleaning up the data store when the switch disconnects from all nodes, i.e. when all "live" registrations go away. So the idea is to only activate one and only one "live" candidate when a switch connects and one and only one "dead" candidate when all "live" candidates unregister. In this manner, "live" candidates take precedence over "dead" ones. A simple way to do this is to introduce the concept of priority with EOS candidate and singleton service registrations. So OF would register "live" candidates with a higher priority, e.g. all "live" candidates with priority 2 and all "dead" candidates with priority 1.

Comment by Anil Vishnoi [ 11/Mar/17 ]

Okay, it's bit clear now. But looks like we want application to do the explicit registration for "dead" candidate per device. That raises another question – how plugin will know when to register the dead candidate for specific device? For example, if device connects to controller-1, it can register as a "live" and "dead" instance, but other controllers won't be able to register as a "dead" instance because switch didn't connect to that. If we want application to rely on listening on data store, so that once controller-1 write to the data store, it will notify to other controller's and then they can register as a "dead" instance, but i see possibility of race conditions in this approach.

The issues that i reported in this bug is something we solved by listening for the ownership change event (isOwner=false, hasOwner=false, wasOwner=false) when we were using EntityOwnershipSerivce. But this again pop-ed up once we moved to singleton clustering, because of it's limitation. I am not able to understand in which scenario this notification approach won't work? Because looking at the current approach that we are discussing here is pretty much doing the same thing that EOS was doing, it's just that now application will have to do additional registration, and EOS took care of it implicitly.

Comment by Robert Varga [ 11/Mar/17 ]

I do not have a ready answer for "how do other nodes know to register" – I have proposed DTCL for sake of argument, but answering that requires understanding what tools are at the application's disposal.

Pure notifications are prone to races in face of switch reconnects: which a 'no leader' notification is being delivered and the data is being cleared by a node, the switch may already be transitioning to up state and you end up erasing the new data – which boils down precisely to Colin's 'you get a lease' argument.

Comment by Viera Zelcamova [ 27/Mar/17 ]

Removing target date, assignee. I assume this bug belong under agreement, for next step.

Comment by OpenDaylight Release [ 03/May/18 ]

Since the bug is unassigned I'm currently assigning it to you.

Please assign to the relevant person. 

Comment by Anil Vishnoi [ 25/May/18 ]

opendaylight.release I am not working on this bug, so i removed myself as assignee. I would suggest if the bug is not assigned, please assign it to PTL or keep it unassigned.

Generated at Wed Feb 07 20:09:10 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.