<!-- 
RSS generated by JIRA (8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d) at Wed Feb 07 19:14:22 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>OpenDaylight JIRA</title>
    <link>https://jira.opendaylight.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>8.20.10</version>
        <build-number>820010</build-number>
        <build-date>22-06-2022</build-date>
    </build-info>


<item>
            <title>[BGPCEP-872] Controller does not send KEEPALIVE while learning prefixes</title>
                <link>https://jira.opendaylight.org/browse/BGPCEP-872</link>
                <project id="10108" key="BGPCEP">bgpcep</project>
                    <description>&lt;p&gt;The regression is detected here:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jenkins.opendaylight.org/releng/job/bgpcep-csit-1node-bgp-ingest-mixed-all-neon/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://jenkins.opendaylight.org/releng/job/bgpcep-csit-1node-bgp-ingest-mixed-all-neon/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;BGP scale test uses play.py script to setup BGP session and inject 500K prefixes to controller:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
python play.py --amount 500000 --myip=10.30.171.84 --myport=17900 --peerip=10.30.170.47 --peerport=1790 --insert=10 --withdraw=9 --prefill 10 --update single --info --results bgp.csv &amp;amp;&amp;gt; play.py.out
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;After ~3 minutes (hold timer) the BGP script throws this ERROR:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
2019-05-06 10:08:44,090 INFO BGP-Dummy-1: Iteration: 296000 - total remaining prefixes: 203991
2019-05-06 10:08:44,726 ERROR BGP-Dummy-1: Peer has overstepped the hold timer.
Unhandled exception in thread started by &amp;lt;function job at 0x7fefba965938&amp;gt;
Traceback (most recent call last):
  File &lt;span class=&quot;code-quote&quot;&gt;&quot;play.py&quot;&lt;/span&gt;, line 2066, in job
    state.perform_one_loop_iteration()
  File &lt;span class=&quot;code-quote&quot;&gt;&quot;play.py&quot;&lt;/span&gt;, line 1958, in perform_one_loop_iteration
    self.timer.check_peer_hold_time(self.timer.snapshot_time)
  File &lt;span class=&quot;code-quote&quot;&gt;&quot;play.py&quot;&lt;/span&gt;, line 1429, in check_peer_hold_time
    raise RuntimeError(&lt;span class=&quot;code-quote&quot;&gt;&quot;Peer has overstepped the hold timer.&quot;&lt;/span&gt;)
RuntimeError: Peer has overstepped the hold timer.
Traceback (most recent call last):
  File &lt;span class=&quot;code-quote&quot;&gt;&quot;play.py&quot;&lt;/span&gt;, line 2168, in &amp;lt;module&amp;gt;
    threaded_job(arguments)
  File &lt;span class=&quot;code-quote&quot;&gt;&quot;play.py&quot;&lt;/span&gt;, line 2162, in threaded_job
    rpcserver.serve_forever()
  File &lt;span class=&quot;code-quote&quot;&gt;&quot;/usr/lib/python2.7/SocketServer.py&quot;&lt;/span&gt;, line 231, in serve_forever
    poll_interval)
  File &lt;span class=&quot;code-quote&quot;&gt;&quot;/usr/lib/python2.7/SocketServer.py&quot;&lt;/span&gt;, line 150, in _eintr_retry
    &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; func(*args)
KeyboardInterrupt
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And the session is disconnected:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
2019-05-06T10:08:46,632 | INFO  | epollEventLoopGroup-10-1 | BGPSessionImpl                   | 242 - org.opendaylight.bgpcep.bgp-rib-impl - 0.11.1.SNAPSHOT | End of input detected. Close the session.
2019-05-06T10:08:46,633 | INFO  | epollEventLoopGroup-10-1 | BGPPeer                          | 242 - org.opendaylight.bgpcep.bgp-rib-impl - 0.11.1.SNAPSHOT | Session with peer 10.30.171.99 went down
2019-05-06T10:08:46,633 | INFO  | epollEventLoopGroup-10-1 | BGPPeer                          | 242 - org.opendaylight.bgpcep.bgp-rib-impl - 0.11.1.SNAPSHOT | Closing session with peer
2019-05-06T10:08:46,650 | INFO  | epollEventLoopGroup-10-1 | AbstractPeer                     | 242 - org.opendaylight.bgpcep.bgp-rib-impl - 0.11.1.SNAPSHOT | Closed per Peer /(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)bgp-rib/rib/rib[{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)id=example-bgp-rib}]/peer/peer[{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)peer-id=bgp:&lt;span class=&quot;code-comment&quot;&gt;//10.30.171.99}] removed
&lt;/span&gt;2019-05-06T10:08:46,653 | INFO  | epollEventLoopGroup-10-1 | AbstractPeer                     | 242 - org.opendaylight.bgpcep.bgp-rib-impl - 0.11.1.SNAPSHOT | Closing peer chain Uri{_value=bgp:&lt;span class=&quot;code-comment&quot;&gt;//10.30.171.99}
&lt;/span&gt;2019-05-06T10:08:46,659 | INFO  | epollEventLoopGroup-10-1 | BGPSessionImpl                   | 242 - org.opendaylight.bgpcep.bgp-rib-impl - 0.11.1.SNAPSHOT | Closing session: BGPSessionImpl{channel=[id: 0x79806baf, L:/10.30.171.133:1790 ! R:/10.30.171.99:17900], state=UP}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;According to the play.py logs controller does not send any KEEPALIVE message while it is learning the prefixes, this is main reason the test fails. See attached test tool logs, when it works controller sends 1 KEEPALIVE every ~1 min.&lt;/p&gt;</description>
                <environment></environment>
        <key id="31662">BGPCEP-872</key>
            <summary>Controller does not send KEEPALIVE while learning prefixes</summary>
                <type id="10104" iconUrl="https://jira.opendaylight.org/secure/viewavatar?size=xsmall&amp;avatarId=10303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.opendaylight.org/images/icons/priorities/critical.svg">High</priority>
                        <status id="10003" iconUrl="https://jira.opendaylight.org/images/icons/status_generic.gif" description="">Confirmed</status>
                    <statusCategory id="2" key="new" colorName="blue-gray"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="-1">Unassigned</assignee>
                                    <reporter username="ecelgp">Luis Gomez</reporter>
                        <labels>
                    </labels>
                <created>Mon, 6 May 2019 17:53:51 +0000</created>
                <updated>Mon, 3 Aug 2020 10:01:46 +0000</updated>
                                                                                <due></due>
                            <votes>0</votes>
                                    <watches>1</watches>
                                                                                                                <comments>
                            <comment id="66758" author="ecelgp" created="Mon, 6 May 2019 20:11:04 +0000"  >&lt;p&gt;The issue is visible in all branches but in Neon started to happen very recently (May 1st).&lt;/p&gt;</comment>
                            <comment id="66779" author="ecelgp" created="Wed, 8 May 2019 20:52:24 +0000"  >&lt;p&gt;FYI, I setup job in sandbox to run scale test for any candidate patch:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jenkins.opendaylight.org/sandbox/job/bgpcep-csit-1node-bgp-ingest-mixed-all-neon/10&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://jenkins.opendaylight.org/sandbox/job/bgpcep-csit-1node-bgp-ingest-mixed-all-neon/10&lt;/a&gt;&lt;/p&gt;
</comment>
                            <comment id="66780" author="ecelgp" created="Wed, 8 May 2019 22:51:06 +0000"  >&lt;p&gt;I just uploaded BGP log from another test (bgp-tool-600k.txt), and there we can see the controller sends the KEEPALIVE messages after the prefix learning is done. This proves the controller only misses the KEEPALIVE sending while it is learning prefixes at high rate, otherwise it is fine.&lt;/p&gt;</comment>
                            <comment id="66816" author="ecelgp" created="Wed, 22 May 2019 02:50:44 +0000"  >&lt;p&gt;From my observation the issue of missing KEEPALIVE sending happens very consistent whenever the controller is busy for long time (few minutes). This behavior has been probably there all this time. Before the regression, it is very possible the controller got less busy or took less time for the same test case and therefore it passed. So this bug is exposed by a perf regression and the workaround for the missing KEEPALIVE is to increase the BGP holdtime on the BGP system peering with the ODL.&lt;/p&gt;</comment>
                            <comment id="67032" author="rovarga" created="Mon, 29 Jul 2019 23:22:16 +0000"  >&lt;p&gt;I think this has been addressed&lt;/p&gt;</comment>
                            <comment id="67033" author="ecelgp" created="Mon, 29 Jul 2019 23:52:03 +0000"  >&lt;p&gt;FYI I recently adjusted CSIT thresholds so we can effectively check for further regressions:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://git.opendaylight.org/gerrit/c/releng/builder/+/83178&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/c/releng/builder/+/83178&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I actually forgot to update this ticket with this information, so maybe this is why you see the regression is gone?&lt;/p&gt;</comment>
                            <comment id="67043" author="ecelgp" created="Wed, 31 Jul 2019 11:11:17 +0000"  >&lt;p&gt;Right, by simply running the test with original 500K vs current 250K we can see the regression is still there (use TESTOPTIONS -v COUNT:500000):&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jenkins.opendaylight.org/releng/view/bgpcep/job/bgpcep-csit-1node-gate-bgp-ingest-mixed-all-sodium/5/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://jenkins.opendaylight.org/releng/view/bgpcep/job/bgpcep-csit-1node-gate-bgp-ingest-mixed-all-sodium/5/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="67044" author="rovarga" created="Wed, 31 Jul 2019 14:20:40 +0000"  >&lt;p&gt;Okay, this is kind of weird, as I definitely saw a keepalive being sent in one of the gate job runs. Let&apos;s see what &lt;a href=&quot;https://jenkins.opendaylight.org/releng/view/bgpcep/job/bgpcep-csit-1node-gate-bgp-ingest-mixed-all-sodium/6/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://jenkins.opendaylight.org/releng/view/bgpcep/job/bgpcep-csit-1node-gate-bgp-ingest-mixed-all-sodium/6/&lt;/a&gt; does (with BGPSessionImpl debugs)&lt;/p&gt;</comment>
                            <comment id="67047" author="rovarga" created="Thu, 1 Aug 2019 16:07:54 +0000"  >&lt;p&gt;Okay, this looks like a netty read starvation ... I will need to investigate it a bit more.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="15212" name="bgp-tool-600k.txt" size="67046" author="ecelgp" created="Wed, 8 May 2019 22:47:09 +0000"/>
                            <attachment id="15205" name="bgp-tool-nok.txt" size="35652" author="ecelgp" created="Mon, 6 May 2019 23:36:37 +0000"/>
                            <attachment id="15203" name="bgp-tool-ok.txt" size="59267" author="ecelgp" created="Mon, 6 May 2019 22:50:50 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                            <customfield id="customfield_11400" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10000" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>0|i03non:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>