<!-- 
RSS generated by JIRA (8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d) at Wed Feb 07 20:34:04 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>OpenDaylight JIRA</title>
    <link>https://jira.opendaylight.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>8.20.10</version>
        <build-number>820010</build-number>
        <build-date>22-06-2022</build-date>
    </build-info>


<item>
            <title>[OPNFLWPLUG-1049] Switch handshaking loops indefinitely</title>
                <link>https://jira.opendaylight.org/browse/OPNFLWPLUG-1049</link>
                <project id="10155" key="OPNFLWPLUG">OpenFlowPlugin</project>
                    <description>&lt;p&gt;I&apos;m running ODL fluorine stable release (opendaylight-0.9.0.tar.gz) and facing this problem:&lt;/p&gt;

&lt;p&gt;when a simulated switch (mininet) connects and disconnects very frequently, the openflowplugin enters in a loop and cannot&#160;recover the&#160;handshaking properly. This also affects the connection with some other switches.&lt;/p&gt;

&lt;p&gt;I can reproduce the same issue in a network of real switches (Extreme/Edgecore switches).&lt;/p&gt;

&lt;p&gt;When this problem happens, there is also a huge memory leak resulting in having the number of&#160;DeviceContextImpl instances growing indefinitely.&lt;/p&gt;

&lt;p&gt;Steps to reproduce:&lt;/p&gt;

&lt;p&gt;1) run&#160;opendaylight-0.9.0&lt;/p&gt;

&lt;p&gt;2)&#160;feature:install features-openflowplugin&lt;/p&gt;

&lt;p&gt;3) run mininet (sudo mn --topo linear,20 --switch ovsk,protocols=OpenFlow13 --mac --controller remote,port=6633,ip=127.0.0.1)&lt;/p&gt;

&lt;p&gt;4) simulate a switch disconnection by running the command &quot;./changectrl.sh 10000 0.1&quot; (script in attachments)&lt;/p&gt;

&lt;p&gt;5) wait 1-2 minutes, you should see odl trying indefinitely to regain the connection&lt;/p&gt;

&lt;p&gt;6) stop the script, the memory leak is now&#160;growing (you can check the number of instances of DeviceContextImpl by running &quot;jcmd&#160;&amp;lt;pid&amp;gt; GC.class_histogram | grep -e &quot;org.opendaylight.openflowplugin.impl.device.DeviceContextImpl$&quot;)&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;In attachment also the karaf.log&lt;/p&gt;</description>
                <environment></environment>
        <key id="31057">OPNFLWPLUG-1049</key>
            <summary>Switch handshaking loops indefinitely</summary>
                <type id="10104" iconUrl="https://jira.opendaylight.org/secure/viewavatar?size=xsmall&amp;avatarId=10303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.opendaylight.org/images/icons/priorities/critical.svg">High</priority>
                        <status id="5" iconUrl="https://jira.opendaylight.org/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="green"/>
                                    <resolution id="10000">Done</resolution>
                                        <assignee username="Somashekhar">Somashekhar Javalagi</assignee>
                                    <reporter username="leonardo.milleri">Leonardo Milleri</reporter>
                        <labels>
                    </labels>
                <created>Thu, 15 Nov 2018 10:17:16 +0000</created>
                <updated>Mon, 27 Sep 2021 09:00:58 +0000</updated>
                            <resolved>Thu, 14 Nov 2019 09:54:31 +0000</resolved>
                                    <version>Nitrogen-SR1</version>
                    <version>Fluorine</version>
                                                    <component>openflowplugin-impl</component>
                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                                                                <comments>
                            <comment id="65741" author="ecelgp" created="Sun, 25 Nov 2018 23:25:19 +0000"  >&lt;p&gt;FYI I downloaded latest fluorine controller from:&lt;br/&gt;
&lt;a href=&quot;https://nexus.opendaylight.org/content/repositories/opendaylight.snapshot/org/opendaylight/integration/karaf/0.9.2-SNAPSHOT&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://nexus.opendaylight.org/content/repositories/opendaylight.snapshot/org/opendaylight/integration/karaf/0.9.2-SNAPSHOT&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;and installed feature: odl-openflowplugin-flow-services-rest&lt;/p&gt;

&lt;p&gt;Then I started mininet and run attached script on a VM with OVS 2.8.1.&lt;/p&gt;

&lt;p&gt;After the above I am not able to reproduce, maybe the CPU plays a role here. I am using a 4 CPU VM for controller + 2 CPU VM for mininet.&lt;/p&gt;</comment>
                            <comment id="65751" author="leonardo.milleri" created="Mon, 26 Nov 2018 14:16:15 +0000"  >&lt;p&gt;I&apos;ve just tried with the latest fluorine controller (0.9.2-SNAPSHOT) and I can still reproduce the problem.&lt;/p&gt;

&lt;p&gt;The first time was ok, but then I tried&#160;two more times and in both cases the problem was reproduced.&lt;/p&gt;

&lt;p&gt;To be honest, it is better to wait till the last moment, when&#160;the changectrl scripts ends.&lt;/p&gt;

&lt;p&gt;I&apos;m running mininet (Open vSwitch 2.9.0) in the same machine of the ODL controller (dell laptop precision 3530)&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="65764" author="ecelgp" created="Mon, 26 Nov 2018 21:29:53 +0000"  >&lt;p&gt;OK, I changed the test setup to run in my laptop and now I see couple of issues:&lt;/p&gt;

&lt;p&gt;1) Switches different from the flapping one (s1) start to disconnect &amp;amp; reconnect after the default inactivity probe (3 sec) kicks in. This aggravates the issue but it can be easily suppressed by setting OVS inactivity probe to some value bigger than controller default inactivity probe (15 sec). You can use following script after mininet is up to set inactivity probe:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
#!/bin/bash

x=`sudo ovs-vsctl --columns=_uuid list Controller | awk &lt;span class=&quot;code-quote&quot;&gt;&apos;{print $NF}&apos;&lt;/span&gt;`
echo $x
&lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; i in $x
&lt;span class=&quot;code-keyword&quot;&gt;do&lt;/span&gt;
       sudo ovs-vsctl set Controller $i inactivity_probe=20
done
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;2) After applying the above and letting the flap run for a while I see that after stopping the flap, the switch s1 cannot connect anymore to controller. This is something we have to fix as for now the only workaround is to restart the controller.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.opendaylight.org/secure/ViewProfile.jspa?name=leonardo.milleri&quot; class=&quot;user-hover&quot; rel=&quot;leonardo.milleri&quot;&gt;leonardo.milleri&lt;/a&gt;, can you try with the inactivity probe configuration I posted and check if you see the same (switch s1 cannot connect anymore to controller) or any other faulty behavior?&lt;/p&gt;</comment>
                            <comment id="65789" author="leonardo.milleri" created="Tue, 27 Nov 2018 14:49:52 +0000"  >&lt;p&gt;Thank you&#160;Luis.&lt;/p&gt;

&lt;p&gt;Today for some reasons I can&apos;t reproduce the problem with inactivity_probe=3.&lt;/p&gt;

&lt;p&gt;As far as concern&#160;inactivity_probe=20, this is also working perfectly for me, eventually all the switches are connected (including s1).&lt;/p&gt;

&lt;p&gt;Can you please provide some more details about how the inactivity probe can affect the openflow connection? This can help on reproducing the problem as well.&lt;/p&gt;

&lt;p&gt;I&apos;ll carry on doing some others tests and let you know&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="65804" author="leonardo.milleri" created="Wed, 28 Nov 2018 09:15:42 +0000"  >&lt;p&gt;Adding some more information: I&apos;ve just seen the problem in a real network of edgecore switches running pica8 NOS, opendaylight version is nitrogen-SR1.&lt;/p&gt;

&lt;p&gt;The following log fragment is related to the issue:&lt;/p&gt;

&lt;p&gt;2018-11-28 05:10:11,251 | WARN | tLoopGroup-16-14 | ClusterSingletonServiceGroupImpl | 397 - org.opendaylight.mdsal.singleton-dom-impl - 2.3.1 | Service group openflow:2465031308174919169 stopping unregistered service org.opendaylight.openflowplugin.impl.lifecycle.ContextChainImpl@111fb98b&lt;br/&gt;
 2018-11-28 05:10:11,262 | ERROR | pool-60-thread-1 | OutboundQueueProviderImpl | 436 - org.opendaylight.openflowplugin.impl - 0.5.1 | No queue present, failing request&lt;br/&gt;
 2018-11-28 05:10:11,262 | WARN | pool-60-thread-1 | RpcContextImpl | 436 - org.opendaylight.openflowplugin.impl - 0.5.1 | Xid cannot be reserved for new RequestContext, node:openflow:2465031308174919169&lt;/p&gt;</comment>
                            <comment id="65810" author="ecelgp" created="Wed, 28 Nov 2018 18:31:43 +0000"  >&lt;p&gt;If you are running OpenFlow in a real network the first thing to do is the set the switches inactivity_probe or equivalent timer in switch (e.g. time the switch waits before sending echo request to controller when it does not receive any packet from controller) to more than 15 sec. The reason is when controller is busy it can miss to respond to the switch echo request and that causes switch disconnect and further problems when switch reconnect again.&lt;/p&gt;</comment>
                            <comment id="65853" author="somashekhar" created="Sun, 2 Dec 2018 17:22:43 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.opendaylight.org/secure/ViewProfile.jspa?name=leonardo.milleri&quot; class=&quot;user-hover&quot; rel=&quot;leonardo.milleri&quot;&gt;leonardo.milleri&lt;/a&gt;, The script in the attached file has only connection command for the switch, should it be modified to trigger disconnect as well.&lt;/p&gt;</comment>
                            <comment id="65854" author="ecelgp" created="Mon, 3 Dec 2018 01:40:10 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.opendaylight.org/secure/ViewProfile.jspa?name=Somashekhar&quot; class=&quot;user-hover&quot; rel=&quot;Somashekhar&quot;&gt;Somashekhar&lt;/a&gt;, if you open the script, the line that says:&lt;br/&gt;
 sudo ovs-vsctl set-controller s1 tcp:128.0.0.1:6633&lt;br/&gt;
should normally disconnect the switch as 128.0.0.1 is not a known address.&lt;/p&gt;</comment>
                            <comment id="65890" author="leonardo.milleri" created="Wed, 5 Dec 2018 13:44:02 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.opendaylight.org/secure/ViewProfile.jspa?name=ecelgp&quot; class=&quot;user-hover&quot; rel=&quot;ecelgp&quot;&gt;ecelgp&lt;/a&gt;, do you know if anyone is working on this issue in ODL and how long would it take to fix it?&lt;/p&gt;

&lt;p&gt;Anything can I help with?&lt;/p&gt;

&lt;p&gt;In the meantime, I&apos;ll try to understand the implications of the workaround (inactivity_probe) and if it&apos;ll sort out any effects.&lt;/p&gt;</comment>
                            <comment id="65927" author="somashekhar" created="Mon, 10 Dec 2018 10:18:10 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.opendaylight.org/secure/ViewProfile.jspa?name=leonardo.milleri&quot; class=&quot;user-hover&quot; rel=&quot;leonardo.milleri&quot;&gt;leonardo.milleri&lt;/a&gt; Can you please try testing the scenario with the gerrit review to check if issue still comes?&lt;/p&gt;</comment>
                            <comment id="65944" author="ecelgp" created="Mon, 10 Dec 2018 18:28:19 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.opendaylight.org/secure/ViewProfile.jspa?name=Somashekhar&quot; class=&quot;user-hover&quot; rel=&quot;Somashekhar&quot;&gt;Somashekhar&lt;/a&gt;, FYI I quickly tried your patch distribution:&lt;br/&gt;
&lt;a href=&quot;https://nexus.opendaylight.org/content/repositories/opendaylight.snapshot/org/opendaylight/integration/openflowplugin/karaf/0.9.2-SNAPSHOT/karaf-0.9.2-20181210.143812-12.zip&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://nexus.opendaylight.org/content/repositories/opendaylight.snapshot/org/opendaylight/integration/openflowplugin/karaf/0.9.2-SNAPSHOT/karaf-0.9.2-20181210.143812-12.zip&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And I still see after running the flap script for a while the switch s1 (openflow:1) cannot connect anymore to controller.&lt;/p&gt;</comment>
                            <comment id="65969" author="leonardo.milleri" created="Wed, 12 Dec 2018 16:33:34 +0000"  >&lt;p&gt;I tried the fix on top of the stable-fluorine branch and I could be able to reproduce the problem when running the script the third time (for the first 2 attempts I couldn&apos;t reproduce it).&lt;/p&gt;

&lt;p&gt;Instead, it seems the master branch is not having the same problem (run the script 5 times), or at least it is more robust. Are you aware of any commits in the master branch actually fixing/mitigating the issue?&lt;/p&gt;</comment>
                            <comment id="66024" author="somashekhar" created="Mon, 17 Dec 2018 05:13:03 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.opendaylight.org/secure/ViewProfile.jspa?name=leonardo.milleri&quot; class=&quot;user-hover&quot; rel=&quot;leonardo.milleri&quot;&gt;leonardo.milleri&lt;/a&gt; we are planning to introduce device connection hold time, which is the minimum amount of time switch has to wait until it gets connected again, to reduce load on controller.&lt;/p&gt;

&lt;p&gt;We will record the switch last connected time. If switch is connected again within the hold time, then the connection will not be accepted.&lt;/p&gt;</comment>
                            <comment id="66051" author="leonardo.milleri" created="Tue, 18 Dec 2018 10:56:57 +0000"  >&lt;p&gt;Thank you, I&apos;ll import the attached changes and retest&lt;/p&gt;</comment>
                            <comment id="66136" author="vishnoianil@gmail.com" created="Mon, 7 Jan 2019 16:59:24 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.opendaylight.org/secure/ViewProfile.jspa?name=ecelgp&quot; class=&quot;user-hover&quot; rel=&quot;ecelgp&quot;&gt;ecelgp&lt;/a&gt; If you get a chance can you please test the latest patch for this issue?&lt;/p&gt;</comment>
                            <comment id="66305" author="vishnoianil@gmail.com" created="Mon, 28 Jan 2019 17:30:04 +0000"  >&lt;p&gt;Discussed dampening mechanism for 3-node cluster setup. Having a local dampening mechanism (connection dampening in the context of single node ) and global dampening mechiansm (connection dampening across the three node cluster) would be a great value add.&lt;/p&gt;</comment>
                            <comment id="67035" author="somashekhar" created="Tue, 30 Jul 2019 11:20:30 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.opendaylight.org/secure/ViewProfile.jspa?name=ecelgp&quot; class=&quot;user-hover&quot; rel=&quot;ecelgp&quot;&gt;ecelgp&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For this issue, we are expecting csit to pass to proceed further. I am seeing some of openflowplugin sodium csit jobs failing consistently only with reason ConnectionError: HTTPConnectionPool(host=&apos;10.30.170.90&apos;, port=8181). Are these seen before?&lt;/p&gt;

&lt;p&gt;I just ran csit on dummy test review and below are logs.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jenkins.opendaylight.org/releng/job/openflowplugin-patch-test-core-sodium/83/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://jenkins.opendaylight.org/releng/job/openflowplugin-patch-test-core-sodium/83/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="67037" author="ecelgp" created="Tue, 30 Jul 2019 22:49:28 +0000"  >&lt;p&gt;This is due to this regression:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.opendaylight.org/browse/CONTROLLER-1906&quot; class=&quot;external-link&quot; rel=&quot;nofollow&quot;&gt;https://jira.opendaylight.org/browse/CONTROLLER-1906&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="67394" author="somashekhar" created="Thu, 14 Nov 2019 09:54:31 +0000"  >&lt;p&gt;Merged in master branch, which is magnesium&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="15033" name="changectrl.sh" size="238" author="leonardo.milleri" created="Thu, 15 Nov 2018 09:54:47 +0000"/>
                            <attachment id="15032" name="karaf.log.tar.gz" size="405175" author="leonardo.milleri" created="Thu, 15 Nov 2018 10:15:18 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                            <customfield id="customfield_11400" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10000" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>0|i03knj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>