<!-- 
RSS generated by JIRA (8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d) at Wed Feb 07 20:36:00 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>OpenDaylight JIRA</title>
    <link>https://jira.opendaylight.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>8.20.10</version>
        <build-number>820010</build-number>
        <build-date>22-06-2022</build-date>
    </build-info>


<item>
            <title>[OVSDB-288] passive connection not reconnected if ovs service is restarted</title>
                <link>https://jira.opendaylight.org/browse/OVSDB-288</link>
                <project id="10158" key="OVSDB">ovsdb</project>
                    <description>&lt;p&gt;scenario:&lt;/p&gt;

&lt;p&gt;controller initiates the connection to passive ovs instance (e.g. ptcp)&lt;/p&gt;

&lt;p&gt;instance is connected and configurations can be made&lt;/p&gt;

&lt;p&gt;restart the ovs instance (reboot entire system, stop/start ovsdb-server/ovs-vswitchd)&lt;/p&gt;


&lt;p&gt;result:&lt;/p&gt;

&lt;p&gt;ovs instance is no longer connected or existing in operational store.  ovsdb&lt;br/&gt;
plugin does not seem to retry connection.&lt;/p&gt;</description>
                <environment>&lt;p&gt;Operating System: All&lt;br/&gt;
Platform: All&lt;/p&gt;</environment>
        <key id="21980">OVSDB-288</key>
            <summary>passive connection not reconnected if ovs service is restarted</summary>
                <type id="10104" iconUrl="https://jira.opendaylight.org/secure/viewavatar?size=xsmall&amp;avatarId=10303&amp;avatarType=issuetype">Bug</type>
                                                <status id="5" iconUrl="https://jira.opendaylight.org/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="green"/>
                                    <resolution id="10000">Done</resolution>
                                        <assignee username="Avishnoi">Anil Vishnoi</assignee>
                                    <reporter username="jluhrsen">Jamo Luhrsen</reporter>
                        <labels>
                    </labels>
                <created>Wed, 3 Feb 2016 21:53:38 +0000</created>
                <updated>Wed, 6 Apr 2016 21:49:58 +0000</updated>
                            <resolved>Wed, 6 Apr 2016 21:49:58 +0000</resolved>
                                    <version>unspecified</version>
                                                    <component>Southbound.Open_vSwitch</component>
                        <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                                                                <comments>
                            <comment id="41236" author="vishnoianil@gmail.com" created="Wed, 10 Feb 2016 20:22:51 +0000"  >&lt;p&gt;Although we can add the retry logic, but there is no deterministic approach about when and how long we should retry. There is a possibility that the switch is totally gone and it&apos;s not going to come back up at all. &lt;/p&gt;

&lt;p&gt;But i agree that we should have some retry logic for a reasonable time to atleast try and see if switch is back on, probably do 10 retry where first try will be 10 second after disconnection and then we increment the delay by 10 seconds for each next try e.g (1st try at = 10S, 2nd = 20S 3rd=30S 4th=40S.... 10th=100S ), that will give around 9 minutes window to user to reboot/reconfigure the system.&lt;/p&gt;</comment>
                            <comment id="41237" author="jluhrsen" created="Wed, 10 Feb 2016 21:56:24 +0000"  >&lt;p&gt;(In reply to Anil Vishnoi from comment #1)&lt;br/&gt;
&amp;gt; Although we can add the retry logic, but there is no deterministic approach&lt;br/&gt;
&amp;gt; about when and how long we should retry. There is a possibility that the&lt;br/&gt;
&amp;gt; switch is totally gone and it&apos;s not going to come back up at all. &lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; But i agree that we should have some retry logic for a reasonable time to&lt;br/&gt;
&amp;gt; atleast try and see if switch is back on, probably do 10 retry where first&lt;br/&gt;
&amp;gt; try will be 10 second after disconnection and then we increment the delay by&lt;br/&gt;
&amp;gt; 10 seconds for each next try e.g (1st try at = 10S, 2nd = 20S 3rd=30S&lt;br/&gt;
&amp;gt; 4th=40S.... 10th=100S ), that will give around 9 minutes window to user to&lt;br/&gt;
&amp;gt; reboot/reconfigure the system.&lt;/p&gt;

&lt;p&gt;ofp has retry mechanisms right?  why not use something similar.  Not sure&lt;br/&gt;
I think we should ever give up though.  maybe max out at 60s and then keep&lt;br/&gt;
going once every 60s.  Thinking about scheduled outages, etc.&lt;/p&gt;</comment>
                            <comment id="41238" author="vishnoianil@gmail.com" created="Wed, 10 Feb 2016 22:57:07 +0000"  >&lt;p&gt;(In reply to Jamo Luhrsen from comment #2)&lt;br/&gt;
&amp;gt; (In reply to Anil Vishnoi from comment #1)&lt;br/&gt;
&amp;gt; &amp;gt; Although we can add the retry logic, but there is no deterministic approach&lt;br/&gt;
&amp;gt; &amp;gt; about when and how long we should retry. There is a possibility that the&lt;br/&gt;
&amp;gt; &amp;gt; switch is totally gone and it&apos;s not going to come back up at all. &lt;br/&gt;
&amp;gt; &amp;gt; &lt;br/&gt;
&amp;gt; &amp;gt; But i agree that we should have some retry logic for a reasonable time to&lt;br/&gt;
&amp;gt; &amp;gt; atleast try and see if switch is back on, probably do 10 retry where first&lt;br/&gt;
&amp;gt; &amp;gt; try will be 10 second after disconnection and then we increment the delay by&lt;br/&gt;
&amp;gt; &amp;gt; 10 seconds for each next try e.g (1st try at = 10S, 2nd = 20S 3rd=30S&lt;br/&gt;
&amp;gt; &amp;gt; 4th=40S.... 10th=100S ), that will give around 9 minutes window to user to&lt;br/&gt;
&amp;gt; &amp;gt; reboot/reconfigure the system.&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; ofp has retry mechanisms right?  why not use something similar.  Not sure&lt;br/&gt;
&amp;gt; I think we should ever give up though.  maybe max out at 60s and then keep&lt;br/&gt;
&amp;gt; going once every 60s.  Thinking about scheduled outages, etc.&lt;/p&gt;

&lt;p&gt;OFP has no retry logic, because OFP there is only active connections (switch to controller). Keep re-trying is a bad idea, there is possibility that servers/compute node will never join back, in that case you will keep retrying it forever. If this is a planned outage, application need to remove that configuration from the data store and add it back once server/ovsdb is back.&lt;/p&gt;</comment>
                            <comment id="41239" author="shague@redhat.com" created="Wed, 10 Feb 2016 23:58:35 +0000"  >&lt;p&gt;Another thought, OVS and OVSDB by default retry connections. It is in the other direction rather than the request here, but maybe since that is supported we don&apos;t really need the functionality requested in the bug as a high priority.&lt;/p&gt;</comment>
                            <comment id="41240" author="jluhrsen" created="Thu, 11 Feb 2016 00:56:08 +0000"  >&lt;p&gt;(In reply to Anil Vishnoi from comment #3)&lt;br/&gt;
&amp;gt; (In reply to Jamo Luhrsen from comment #2)&lt;br/&gt;
&amp;gt; &amp;gt; (In reply to Anil Vishnoi from comment #1)&lt;br/&gt;
&amp;gt; &amp;gt; &amp;gt; Although we can add the retry logic, but there is no deterministic approach&lt;br/&gt;
&amp;gt; &amp;gt; &amp;gt; about when and how long we should retry. There is a possibility that the&lt;br/&gt;
&amp;gt; &amp;gt; &amp;gt; switch is totally gone and it&apos;s not going to come back up at all. &lt;br/&gt;
&amp;gt; &amp;gt; &amp;gt; &lt;br/&gt;
&amp;gt; &amp;gt; &amp;gt; But i agree that we should have some retry logic for a reasonable time to&lt;br/&gt;
&amp;gt; &amp;gt; &amp;gt; atleast try and see if switch is back on, probably do 10 retry where first&lt;br/&gt;
&amp;gt; &amp;gt; &amp;gt; try will be 10 second after disconnection and then we increment the delay by&lt;br/&gt;
&amp;gt; &amp;gt; &amp;gt; 10 seconds for each next try e.g (1st try at = 10S, 2nd = 20S 3rd=30S&lt;br/&gt;
&amp;gt; &amp;gt; &amp;gt; 4th=40S.... 10th=100S ), that will give around 9 minutes window to user to&lt;br/&gt;
&amp;gt; &amp;gt; &amp;gt; reboot/reconfigure the system.&lt;br/&gt;
&amp;gt; &amp;gt; &lt;br/&gt;
&amp;gt; &amp;gt; ofp has retry mechanisms right?  why not use something similar.  Not sure&lt;br/&gt;
&amp;gt; &amp;gt; I think we should ever give up though.  maybe max out at 60s and then keep&lt;br/&gt;
&amp;gt; &amp;gt; going once every 60s.  Thinking about scheduled outages, etc.&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; OFP has no retry logic, because OFP there is only active connections (switch&lt;br/&gt;
&amp;gt; to controller). Keep re-trying is a bad idea, there is possibility that&lt;br/&gt;
&amp;gt; servers/compute node will never join back, in that case you will keep&lt;br/&gt;
&amp;gt; retrying it forever. If this is a planned outage, application need to remove&lt;br/&gt;
&amp;gt; that configuration from the data store and add it back once server/ovsdb is&lt;br/&gt;
&amp;gt; back.&lt;/p&gt;

&lt;p&gt;ok, you are right about retrying indefinitely.&lt;/p&gt;</comment>
                            <comment id="41241" author="jluhrsen" created="Thu, 11 Feb 2016 01:06:53 +0000"  >&lt;p&gt;so this bug is being attributed to CSIT failures for not quite the right&lt;br/&gt;
reason.  Maybe this should be, as Sam says, lowered to a simple enhancement&lt;br/&gt;
request.  Some short term re-try mechanism in the case that our ovs node&lt;br/&gt;
services go missing for a period.&lt;/p&gt;

&lt;p&gt;The more serious failure that CSIT is failing on goes like this&lt;br/&gt;
(seems like a new bug to track, but please advise on that)&lt;/p&gt;

&lt;p&gt;A&lt;br/&gt;
ovs in passive mode&lt;/p&gt;

&lt;p&gt;B&lt;br/&gt;
initiate connection from controller&lt;/p&gt;

&lt;p&gt;C&lt;br/&gt;
verify it exists in config and operational&lt;/p&gt;

&lt;p&gt;D&lt;br/&gt;
ovs-ctl stop, then start on ovs node  (simulates node going away for a brief period)&lt;/p&gt;

&lt;p&gt;E&lt;br/&gt;
ovs configured back in passive mode&lt;/p&gt;

&lt;p&gt;F&lt;br/&gt;
verify operational does NOT see node  (because we don&apos;t retry)&lt;/p&gt;

&lt;p&gt;G&lt;br/&gt;
verify still in config (we never deleted it)&lt;/p&gt;

&lt;p&gt;H&lt;br/&gt;
delete from config (starting suggested steps to recover)&lt;/p&gt;

&lt;p&gt;I&lt;br/&gt;
verify not in operational and config (as expected)&lt;/p&gt;

&lt;p&gt;J&lt;br/&gt;
initiate connection from controller fails now.&lt;/p&gt;


&lt;p&gt;log messages of interest, I think, coming from step E:&lt;/p&gt;

&lt;p&gt;2016-02-11 01:03:28,759 | INFO  | entLoopGroup-8-1 | OvsdbConnectionService           | 153 - org.opendaylight.ovsdb.library - 1.2.1.SNAPSHOT | Connection closed ConnectionInfo &lt;span class=&quot;error&quot;&gt;&amp;#91;Remote-address=209.132.179.50, Remote-port=6634, Local-address172.18.182.19, Local-port=36453, type=ACTIVE&amp;#93;&lt;/span&gt;&lt;br/&gt;
2016-02-11 01:03:28,760 | INFO  | entLoopGroup-8-1 | OvsdbConnectionManager           | 159 - org.opendaylight.ovsdb.southbound-impl - 1.2.1.SNAPSHOT | Library disconnected ACTIVE from /209.132.179.50:6634 to /172.18.182.19:36453. Cleaning up the operational data store&lt;br/&gt;
2016-02-11 01:03:28,783 | INFO  | lt-dispatcher-21 | OvsdbConnectionManager           | 159 - org.opendaylight.ovsdb.southbound-impl - 1.2.1.SNAPSHOT | Entity{type=&apos;ovsdb&apos;, id=/(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)network-topology/topology/topology[&lt;/p&gt;
{(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)topology-id=ovsdb:1}
&lt;p&gt;]/node/node[&lt;/p&gt;
{(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)node-id=ovsdb://209.132.179.50:6634}
&lt;p&gt;]} has no owner, cleaning up the operational data store&lt;/p&gt;


&lt;p&gt;log messages of interest, I think, coming from step J:&lt;/p&gt;

&lt;p&gt;2016-02-11 01:03:32,432 | INFO  | lt-dispatcher-21 | OvsdbConnectionManager           | 159 - org.opendaylight.ovsdb.southbound-impl - 1.2.1.SNAPSHOT | Disconnecting from 209.132.179.50:6634&lt;br/&gt;
2016-02-11 01:03:32,621 | WARN  | ult-dispatcher-4 | OvsdbDataChangeListener          | 159 - org.opendaylight.ovsdb.southbound-impl - 1.2.1.SNAPSHOT | Connection to device ConnectionInfo{getRemoteIp=IpAddress [_ipv4Address=Ipv4Address &lt;span class=&quot;error&quot;&gt;&amp;#91;_value=209.132.179.50&amp;#93;&lt;/span&gt;, _value=&lt;span class=&quot;error&quot;&gt;&amp;#91;2, 0, 9, ., 1, 3, 2, ., 1, 7, 9, ., 5, 0&amp;#93;&lt;/span&gt;], getRemotePort=PortNumber &lt;span class=&quot;error&quot;&gt;&amp;#91;_value=6634&amp;#93;&lt;/span&gt;, augmentations={}} already exists. Plugin does not allow multiple connections to same device, hence dropping the request OvsdbNodeAugmentation{getConnectionInfo=ConnectionInfo{getRemoteIp=IpAddress [_ipv4Address=Ipv4Address &lt;span class=&quot;error&quot;&gt;&amp;#91;_value=209.132.179.50&amp;#93;&lt;/span&gt;, _value=&lt;span class=&quot;error&quot;&gt;&amp;#91;2, 0, 9, ., 1, 3, 2, ., 1, 7, 9, ., 5, 0&amp;#93;&lt;/span&gt;], getRemotePort=PortNumber &lt;span class=&quot;error&quot;&gt;&amp;#91;_value=6634&amp;#93;&lt;/span&gt;, augmentations={}}}&lt;/p&gt;</comment>
                            <comment id="41242" author="vishnoianil@gmail.com" created="Thu, 11 Feb 2016 01:42:23 +0000"  >&lt;p&gt;(In reply to Sam Hague from comment #4)&lt;br/&gt;
&amp;gt; Another thought, OVS and OVSDB by default retry connections. It is in the&lt;br/&gt;
&amp;gt; other direction rather than the request here, but maybe since that is&lt;br/&gt;
&amp;gt; supported we don&apos;t really need the functionality requested in the bug as a&lt;br/&gt;
&amp;gt; high priority.&lt;/p&gt;

&lt;p&gt;Yes, but this is something that works for active connection only, and not for passive connection and for passive connection it&apos;s kind of enhancement work.&lt;/p&gt;</comment>
                            <comment id="41243" author="shague@redhat.com" created="Thu, 11 Feb 2016 01:57:34 +0000"  >&lt;p&gt;(In reply to Anil Vishnoi from comment #7)&lt;br/&gt;
&amp;gt; (In reply to Sam Hague from comment #4)&lt;br/&gt;
&amp;gt; &amp;gt; Another thought, OVS and OVSDB by default retry connections. It is in the&lt;br/&gt;
&amp;gt; &amp;gt; other direction rather than the request here, but maybe since that is&lt;br/&gt;
&amp;gt; &amp;gt; supported we don&apos;t really need the functionality requested in the bug as a&lt;br/&gt;
&amp;gt; &amp;gt; high priority.&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; Yes, but this is something that works for active connection only, and not&lt;br/&gt;
&amp;gt; for passive connection and for passive connection it&apos;s kind of enhancement&lt;br/&gt;
&amp;gt; work.&lt;/p&gt;

&lt;p&gt;Agreed, that is what I meant by &quot;other direction&quot; - it is the active connection rather than passive.&lt;/p&gt;</comment>
                            <comment id="41244" author="vishnoianil@gmail.com" created="Thu, 10 Mar 2016 08:17:55 +0000"  >&lt;p&gt;stable/beryllium : &lt;a href=&quot;https://git.opendaylight.org/gerrit/36028&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/36028&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There are two issues discussed in the issue &lt;/p&gt;

&lt;p&gt;(1) No retry for controller initiated connection if connected gets dropped.&lt;/p&gt;

&lt;p&gt;(2) Sometime when switch abruptly goes away (machine down etc), controller is not able to establish connection to switch and there is no operational data in the data store.&lt;/p&gt;

&lt;p&gt;Above patch fixes issue (1). This patch added a reconciliation mechanism, where if controller initiated connection get dropped (connection flapping, machine crash), it will immediately attempt to connect back and after that it will make 10 attempts to connect to switch with the incremental time interval (10,20,30,40....100). Overall it will wait for 9 minutes for switch to come back up. After that it will give up. If there are usecase where we need to wait for longer time, I am open for suggestion.&lt;/p&gt;

&lt;p&gt;We still recommend that if it&apos;s planned outage, user application should explicitly disconnect the switch. &lt;/p&gt;

&lt;p&gt;Issue (2) is happening because tcp connection is not reset/terminated properly from controller side, because switch went down without sending TCP_FIN packet. Fix for this issue is proposed in following patch &lt;a href=&quot;https://git.opendaylight.org/gerrit/#/c/35436/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/#/c/35436/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="41245" author="vishnoianil@gmail.com" created="Thu, 31 Mar 2016 22:33:06 +0000"  >&lt;p&gt;patch stable/beryllium : &lt;a href=&quot;https://git.opendaylight.org/gerrit/#/c/36028/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/#/c/36028/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="41246" author="vishnoianil@gmail.com" created="Thu, 31 Mar 2016 22:34:12 +0000"  >&lt;p&gt;wiki : &lt;a href=&quot;https://wiki.opendaylight.org/view/OVSDB_Integration:OVSDB_SB_Reconciliation&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://wiki.opendaylight.org/view/OVSDB_Integration:OVSDB_SB_Reconciliation&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="41247" author="jluhrsen" created="Wed, 6 Apr 2016 21:49:58 +0000"  >&lt;p&gt;As outlined in comment 6, this bug is fixed.  I have verified in stable/beryllium distro built on 04/02/2016.&lt;/p&gt;

&lt;p&gt;moving to Resolved/Fixed state.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                            <customfield id="customfield_11400" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10208" key="com.atlassian.jira.plugin.system.customfieldtypes:textfield">
                        <customfieldname>External issue ID</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>5221</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10201" key="com.atlassian.jira.plugin.system.customfieldtypes:url">
                        <customfieldname>External issue URL</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[https://bugs.opendaylight.org/show_bug.cgi?id=5221]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10206" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Issue Type</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10305"><![CDATA[Improvement]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10204" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>ODL SR Target Milestone</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10330"><![CDATA[Beryllium-1]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_10000" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>0|i021t3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>