<!-- 
RSS generated by JIRA (8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d) at Wed Feb 07 20:09:10 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>OpenDaylight JIRA</title>
    <link>https://jira.opendaylight.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>8.20.10</version>
        <build-number>820010</build-number>
        <build-date>22-06-2022</build-date>
    </build-info>


<item>
            <title>[MDSAL-231] Singleton Clustering : No notification to cluster nodes if the entity don&apos;t have any owner</title>
                <link>https://jira.opendaylight.org/browse/MDSAL-231</link>
                <project id="10137" key="MDSAL">mdsal</project>
                    <description>&lt;p&gt;With the current implementation of singleton clustering service it providers two notification for a entity - instantiateServiceInstance() &amp;amp; closeServiceInstance(). So whenever an entity is unregistered, it will call the closeServiceInstance(), so that the application instance can do the cleanup. We encountered an issue with openflowplugin clustering that can&apos;t be solve using these existing two notifications. Assume a scenario, where a device is connected to only one controller, so that specific controller will register that device as an entity and get the ownership. If this controller dies, other nodes in the cluster won&apos;t get any notification about the status of that device entity , because they are not registered candidate. So all the data written by owner controller to data store will be remain there. &lt;/p&gt;

&lt;p&gt;EntityOwnershipService do notify the non-candidate nodes if any entity don&apos;t have any owner (isOwner=false, wasOwner=false,hasOwner=false). To resolve the above mentioned issue i think singleton service should expose new notification (e.g noOwnerFound() or noOwnerElected()) that will get triggered in the scenario (isOwner=false, wasOwner=false,hasOwner=false), so that other controllers can do the required clean-up. Currently openflowplugin clustering is pretty much broken because we encountered two issue that can&apos;t be solve using singleton clustering without having a notification similar to what i mentioned above.&lt;/p&gt;

&lt;p&gt;We are planning to explore on using EOS ownership change listener with singleton clustering service to resolve these issues, but if that doesn&apos;t work, this bug is pretty much a blocker bug for us.&lt;/p&gt;

&lt;p&gt;Please let me know if you need more details to clearly understand the issue.&lt;/p&gt;</description>
                <environment>&lt;p&gt;Operating System: All&lt;br/&gt;
Platform: All&lt;/p&gt;</environment>
        <key id="27053">MDSAL-231</key>
            <summary>Singleton Clustering : No notification to cluster nodes if the entity don&apos;t have any owner</summary>
                <type id="10103" iconUrl="https://jira.opendaylight.org/secure/viewavatar?size=xsmall&amp;avatarId=10311&amp;avatarType=issuetype">New Feature</type>
                                            <priority id="2" iconUrl="https://jira.opendaylight.org/images/icons/priorities/critical.svg">High</priority>
                        <status id="10003" iconUrl="https://jira.opendaylight.org/images/icons/status_generic.gif" description="">Confirmed</status>
                    <statusCategory id="2" key="new" colorName="blue-gray"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="-1">Unassigned</assignee>
                                    <reporter username="Avishnoi">Anil Vishnoi</reporter>
                        <labels>
                    </labels>
                <created>Sat, 18 Feb 2017 01:40:31 +0000</created>
                <updated>Tue, 9 Jan 2024 09:05:37 +0000</updated>
                                                            <fixVersion>14.0.0</fixVersion>
                                    <component>CSS</component>
                        <due></due>
                            <votes>0</votes>
                                    <watches>9</watches>
                                                                                                                <comments>
                            <comment id="54584" author="rovarga" created="Mon, 20 Feb 2017 13:33:22 +0000"  >&lt;p&gt;I think this needs a more thorough analysis as to what is the problem and why, along with failure modes.&lt;/p&gt;

&lt;p&gt;The description leads me to believe that stale data during a partition is a problem &amp;#8211; with the implicit assumption that the owner node is in the minority partition and does not sever the connection to the switch (and hence the switch does not connect to another node).&lt;/p&gt;

&lt;p&gt;We are past API freeze and this is a very sensitive change, so the mechanics need to be considered very carefully.&lt;/p&gt;</comment>
                            <comment id="54585" author="vishnoianil@gmail.com" created="Mon, 20 Feb 2017 22:05:26 +0000"  >&lt;p&gt;(In reply to Robert Varga from comment #1)&lt;br/&gt;
&amp;gt; I think this needs a more thorough analysis as to what is the problem and&lt;br/&gt;
&amp;gt; why, along with failure modes.&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; The description leads me to believe that stale data during a partition is a&lt;br/&gt;
&amp;gt; problem &amp;#8211; with the implicit assumption that the owner node is in the&lt;br/&gt;
&amp;gt; minority partition and does not sever the connection to the switch (and&lt;br/&gt;
&amp;gt; hence the switch does not connect to another node).&lt;br/&gt;
&amp;gt; &lt;br/&gt;
I think at the high level the problem is that singleton clustering service don&apos;t notify candidate/non-candidate nodes if any entity has no-owner (irrespective of whether other nodes registered their candidature for that specific entity). Without this notification, as of now we are encountering following two issue &lt;/p&gt;

&lt;p&gt;Issue 1:&lt;br/&gt;
(1) Connect switch to only one controller in the cluster&lt;br/&gt;
(2) Kill the controller&lt;br/&gt;
(3) Stale data in the data store&lt;/p&gt;

&lt;p&gt;Issue 2:&lt;/p&gt;

&lt;p&gt;(1) Connect switch to all the controller in the cluster&lt;br/&gt;
(2) Disconnect the switch from it current owner controller&lt;br/&gt;
(3) It will trigger calling closeServiceInstance() on the current owner controller&lt;br/&gt;
(4) Given that service instance is closed, it removes the data from data store&lt;br/&gt;
(5) That generate data change notification for listener application, and application interprets as a switch disconnected from the controller, but in reality switch is still connected to other cluster nodes.&lt;br/&gt;
(6) Meanwhile Other owner is elected and  instantiateServiceInstance() is called, and that instance put the switch data back to data store&lt;br/&gt;
(7) 6th step generate another data change notification with switch data added,which actually interpreted as switch connection to the controller, which is wrong, because switch was already connected.&lt;/p&gt;

&lt;p&gt;All this is happening because individual instance do the clean-up because it can&apos;t rely on other nodes to do the clean-up because it&apos;s not sure if other  nodes registered themselves as a candidate or not. &lt;/p&gt;

&lt;p&gt;&amp;gt; We are past API freeze and this is a very sensitive change, so the mechanics&lt;br/&gt;
&amp;gt; need to be considered very carefully.&lt;br/&gt;
You mean adding this notification in the singleton clustering service?&lt;/p&gt;</comment>
                            <comment id="54586" author="rovarga" created="Mon, 27 Feb 2017 22:06:01 +0000"  >&lt;p&gt;A pure notification scheme will not suffice, because the two scenarios are really quite different and transitions between them require synchronization.&lt;/p&gt;

&lt;p&gt;At the end of the day, what happens to the data store is up to OFP. From the description it is clear that the data subtree is owned either:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;a &apos;live&apos; instance connected to the switch, updating the data&lt;/li&gt;
	&lt;li&gt;a &apos;dead&apos; instance, making sure the data is erased&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Note that at any given time, even in a partition, the data is logically owned by an active component, which performs an action on the data store &amp;#8211; which is non-atomic and hence requires start/stop transitions.&lt;/p&gt;

&lt;p&gt;The set of nodes which are eligible for registering an instance is different:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;live instances are registered only by nodes which have the switch connected&lt;/li&gt;
	&lt;li&gt;dead instances are registered by all nodes&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Live instances are always preferred before dead ones, which implies a simple feature addition: the notion of instance priority, which governs selection of ownership. Higher-priority instances are always preferred to lower-priority instances in a cluster.&lt;/p&gt;

&lt;p&gt;A high priority instance is registered when a switch is connected. When a switch disconnects &lt;b&gt;it does not delete data&lt;/b&gt;. Therefore an inter-node switch failover will not incur a deletion flap (just a pure overwrite once the initial sync is done).&lt;/p&gt;

&lt;p&gt;A low priority instance is registered on every node when a switch connects (for example as reported by DTCL). When a switch disappears, the instance is unregistered. When this instance becomes active, it will just delete the data tree and shut down. This will mean that as long as the data is in the data store, there exists at least one instance low-priority instance.&lt;/p&gt;

&lt;p&gt;Is there a scenario which would not be addressed by this scheme?&lt;/p&gt;</comment>
                            <comment id="54587" author="rgoulding" created="Tue, 7 Mar 2017 17:11:25 +0000"  >&lt;p&gt;Anil, do you have an update for this?  We are blocked until we receive feedback.  Thanks!&lt;/p&gt;</comment>
                            <comment id="54588" author="tpantelis" created="Wed, 8 Mar 2017 04:16:32 +0000"  >&lt;p&gt;(In reply to Robert Varga from comment #3)&lt;br/&gt;
&amp;gt; A pure notification scheme will not suffice, because the two scenarios are&lt;br/&gt;
&amp;gt; really quite different and transitions between them require synchronization.&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; At the end of the day, what happens to the data store is up to OFP. From the&lt;br/&gt;
&amp;gt; description it is clear that the data subtree is owned either:&lt;br/&gt;
&amp;gt; - a &apos;live&apos; instance connected to the switch, updating the data&lt;br/&gt;
&amp;gt; - a &apos;dead&apos; instance, making sure the data is erased&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; Note that at any given time, even in a partition, the data is logically&lt;br/&gt;
&amp;gt; owned by an active component, which performs an action on the data store &amp;#8211;&lt;br/&gt;
&amp;gt; which is non-atomic and hence requires start/stop transitions.&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; The set of nodes which are eligible for registering an instance is different:&lt;br/&gt;
&amp;gt; - live instances are registered only by nodes which have the switch connected&lt;br/&gt;
&amp;gt; - dead instances are registered by all nodes&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; Live instances are always preferred before dead ones, which implies a simple&lt;br/&gt;
&amp;gt; feature addition: the notion of instance priority, which governs selection&lt;br/&gt;
&amp;gt; of ownership. Higher-priority instances are always preferred to&lt;br/&gt;
&amp;gt; lower-priority instances in a cluster.&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; A high priority instance is registered when a switch is connected. When a&lt;br/&gt;
&amp;gt; switch disconnects &lt;b&gt;it does not delete data&lt;/b&gt;. Therefore an inter-node switch&lt;br/&gt;
&amp;gt; failover will not incur a deletion flap (just a pure overwrite once the&lt;br/&gt;
&amp;gt; initial sync is done).&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; A low priority instance is registered on every node when a switch connects&lt;br/&gt;
&amp;gt; (for example as reported by DTCL). When a switch disappears, the instance is&lt;br/&gt;
&amp;gt; unregistered. When this instance becomes active, it will just delete the&lt;br/&gt;
&amp;gt; data tree and shut down. This will mean that as long as the data is in the&lt;br/&gt;
&amp;gt; data store, there exists at least one instance low-priority instance.&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; Is there a scenario which would not be addressed by this scheme?&lt;/p&gt;

&lt;p&gt;This seems like a reasonable approach. We would need a priority in both EOS candidate and singleton service registrations. In EOS, if a candidate is registered with a higher priority than the current owner, then the new candidate should become the new owner. I think this logic would be implemented in the default selection strategy. We&apos;d have to keep track of each candidate&apos;s priority in the data model.&lt;/p&gt;</comment>
                            <comment id="54589" author="vishnoianil@gmail.com" created="Fri, 10 Mar 2017 02:40:52 +0000"  >&lt;p&gt;(In reply to Robert Varga from comment #3)&lt;br/&gt;
&amp;gt; A pure notification scheme will not suffice, because the two scenarios are&lt;br/&gt;
&amp;gt; really quite different and transitions between them require synchronization.&lt;/p&gt;

&lt;p&gt;In my opinion, root cause of both of these issue is same - non-candidate instances don&apos;t know about the entity that is owner-less. That&apos;s the reason active instance is forced to clean-up the data.&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; At the end of the day, what happens to the data store is up to OFP. From the&lt;br/&gt;
&amp;gt; description it is clear that the data subtree is owned either:&lt;br/&gt;
&amp;gt; - a &apos;live&apos; instance connected to the switch, updating the data&lt;br/&gt;
&amp;gt; - a &apos;dead&apos; instance, making sure the data is erased&lt;/p&gt;

&lt;p&gt;if the device is connected to only one controller and the moment it gets disconnected, all the nodes in the cluster are basically dead-node, isn&apos;t it? or am i confuse with the definition of the live and dead. &lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; Note that at any given time, even in a partition, the data is logically&lt;br/&gt;
&amp;gt; owned by an active component, which performs an action on the data store &amp;#8211;&lt;br/&gt;
&amp;gt; which is non-atomic and hence requires start/stop transitions.&lt;br/&gt;
Agree&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; The set of nodes which are eligible for registering an instance is different:&lt;br/&gt;
&amp;gt; - live instances are registered only by nodes which have the switch connected&lt;br/&gt;
&amp;gt; - dead instances are registered by all nodes&lt;br/&gt;
Dead instance registration is implicit registration that clustering service does, or explicit registration?&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; Live instances are always preferred before dead ones, which implies a simple&lt;br/&gt;
&amp;gt; feature addition: the notion of instance priority, which governs selection&lt;br/&gt;
&amp;gt; of ownership. Higher-priority instances are always preferred to&lt;br/&gt;
&amp;gt; lower-priority instances in a cluster.&lt;br/&gt;
To do what? To notify stopServiceInstance?&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; A high priority instance is registered when a switch is connected.When a&lt;br/&gt;
&amp;gt; switch disconnects &lt;b&gt;it does not delete data&lt;/b&gt;. Therefore an inter-node switch&lt;br/&gt;
&amp;gt; failover will not incur a deletion flap (just a pure overwrite once the&lt;br/&gt;
&amp;gt; initial sync is done).&lt;br/&gt;
Understood ( i assume high priority instance == live instance and low priority instance==dead instance)&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; A low priority instance is registered on every node when a switch connects&lt;br/&gt;
&amp;gt; (for example as reported by DTCL).&lt;br/&gt;
In case of openflowplugin, registration happen when the switch connects to the controller (no data store in the picture at this point), so with this approach, openflowplugin will have to listen on the data store to register for the low priority instance? If not, then plugin will have no way to determine if it needs to register for the low priority instance. But if it needs to rely on the data store notification for registration, it can lead to the stale data situation, in a scenario where switch gets disconnected immediately after the writing the related data in the data store.&lt;/p&gt;

&lt;p&gt;&amp;gt; When a switch disappears, the instance is&lt;br/&gt;
&amp;gt; unregistered. When this instance becomes active, it will just delete the&lt;br/&gt;
&amp;gt; data tree and shut down.&lt;br/&gt;
By active you mean, clustering service callback some notification to the low priority instances ? Will clustering service choose one of the low priority instance or it will notify both the low priority instances ?&lt;br/&gt;
&amp;gt; This will mean that as long as the data is in the&lt;br/&gt;
&amp;gt; data store, there exists at least one instance low-priority instance.&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; Is there a scenario which would not be addressed by this scheme?&lt;/p&gt;</comment>
                            <comment id="54590" author="tpantelis" created="Fri, 10 Mar 2017 03:17:12 +0000"  >&lt;p&gt;I think what Robert is saying is that there would be 2 registration types for a switch&apos;s singleton service, &quot;live&quot; and &quot;dead&quot;. The &quot;live&quot; one is registered on switch connect and becomes eligible to manage the switch. This is what it does now. The &quot;dead&quot; one is registered by every node in the cluster and is responsible for cleaning up the data store when the switch disconnects from all nodes, i.e. when all &quot;live&quot; registrations go away. So the idea is to only activate one and only one &quot;live&quot; candidate when a switch connects and one and only one &quot;dead&quot; candidate when all &quot;live&quot; candidates unregister. In this manner, &quot;live&quot; candidates take precedence over &quot;dead&quot; ones. A simple way to do this is to introduce the concept of priority with EOS candidate and singleton service registrations. So OF would register &quot;live&quot; candidates with a higher priority, e.g. all &quot;live&quot; candidates with priority 2 and all &quot;dead&quot; candidates with priority 1.&lt;/p&gt;</comment>
                            <comment id="54591" author="vishnoianil@gmail.com" created="Sat, 11 Mar 2017 00:47:52 +0000"  >&lt;p&gt;Okay, it&apos;s bit clear now. But looks like we want application to do the explicit registration for &quot;dead&quot; candidate &lt;b&gt;per&lt;/b&gt; device. That raises another question &amp;#8211; how plugin will know when to register the dead candidate for specific device? For example, if device connects to controller-1, it can register as a &quot;live&quot; and &quot;dead&quot; instance, but other controllers won&apos;t be able to register as a &quot;dead&quot; instance because switch didn&apos;t connect to that. If we want application to rely on listening on data store, so that once controller-1 write to the data store, it will notify to other controller&apos;s and then they can register as a &quot;dead&quot; instance, but i see possibility of race conditions in this approach.&lt;/p&gt;

&lt;p&gt;The issues that i reported in this bug is something we solved by listening for the ownership change event (isOwner=false, hasOwner=false, wasOwner=false) when we were using EntityOwnershipSerivce. But this again pop-ed up once we moved to singleton clustering, because of it&apos;s limitation. I am not able to understand in which scenario this notification approach won&apos;t work? Because looking at the current approach that we are discussing here is pretty much doing the same thing that EOS was doing, it&apos;s just that now application will have to do additional registration, and EOS took care of it implicitly.&lt;/p&gt;</comment>
                            <comment id="54592" author="rovarga" created="Sat, 11 Mar 2017 13:03:12 +0000"  >&lt;p&gt;I do not have a ready answer for &quot;how do other nodes know to register&quot; &amp;#8211; I have proposed DTCL for sake of argument, but answering that requires understanding what tools are at the application&apos;s disposal.&lt;/p&gt;

&lt;p&gt;Pure notifications are prone to races in face of switch reconnects: which a &apos;no leader&apos; notification is being delivered and the data is being cleared by a node, the switch may already be transitioning to up state and you end up erasing the new data &amp;#8211; which boils down precisely to Colin&apos;s &apos;you get a lease&apos; argument.&lt;/p&gt;</comment>
                            <comment id="54593" author="vzelcamo@cisco.com" created="Mon, 27 Mar 2017 15:54:40 +0000"  >&lt;p&gt;Removing target date, assignee. I assume this bug belong under agreement, for next step.&lt;/p&gt;</comment>
                            <comment id="62752" author="opendaylight.release" created="Thu, 3 May 2018 09:37:45 +0000"  >&lt;p&gt;Since the bug is unassigned I&apos;m currently assigning it to you.&lt;/p&gt;

&lt;p&gt;Please assign to the relevant person.&#160;&lt;/p&gt;</comment>
                            <comment id="63131" author="vishnoianil@gmail.com" created="Fri, 25 May 2018 00:45:27 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.opendaylight.org/secure/ViewProfile.jspa?name=opendaylight.release&quot; class=&quot;user-hover&quot; rel=&quot;opendaylight.release&quot;&gt;opendaylight.release&lt;/a&gt; I&#160;am not working on this bug, so i removed myself as assignee. I would suggest if the bug is not assigned, please assign it to PTL or keep it unassigned.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10003">
                    <name>Relates</name>
                                                                <inwardlinks description="relates to">
                                        <issuelink>
            <issuekey id="31735">OPNFLWPLUG-1072</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="31827">MDSAL-465</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                            <customfield id="customfield_11400" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10208" key="com.atlassian.jira.plugin.system.customfieldtypes:textfield">
                        <customfieldname>External issue ID</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>7820</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10201" key="com.atlassian.jira.plugin.system.customfieldtypes:url">
                        <customfieldname>External issue URL</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[https://bugs.opendaylight.org/show_bug.cgi?id=7820]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10206" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Issue Type</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10308"><![CDATA[New Feature]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10202" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Priority</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10312"><![CDATA[High]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10000" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>0|i02x4f:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>