<!-- 
RSS generated by JIRA (8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d) at Wed Feb 07 20:15:43 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>OpenDaylight JIRA</title>
    <link>https://jira.opendaylight.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>8.20.10</version>
        <build-number>820010</build-number>
        <build-date>22-06-2022</build-date>
    </build-info>


<item>
            <title>[NETCONF-716] regression in netconf-scale CSIT betweeen Mg Sr1 and Sr2</title>
                <link>https://jira.opendaylight.org/browse/NETCONF-716</link>
                <project id="10142" key="NETCONF">netconf</project>
                    <description>&lt;p&gt;&lt;a href=&quot;https://jenkins.opendaylight.org/releng/job/netconf-csit-1node-scale-only-magnesium&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;This job &lt;/a&gt; appears to have a regression between SR1 and SR2. The job will pass from&lt;br/&gt;
time to time, but for the most part it will fail in the Getmulti and Getsingle suites which&lt;br/&gt;
mount 500 instances of the netconf-testtool and then attempt to &quot;issue requests&quot; on&lt;br/&gt;
each device in order (essentially a GET to yang-ext:mount for each device). When the&lt;br/&gt;
python tool doing those requests has trouble the test case fails.&lt;/p&gt;

&lt;p&gt;In the sandbox, I ran a job with the Sr1 release which saw two failures in 35 iterations,&lt;br/&gt;
whereas using a distro built recently (so Sr2 bits) the results were 8 passes in 36 iterations.&lt;/p&gt;

&lt;p&gt;Jenkins job history in the web ui only goes back 30 builds or so, but I pulled all the&lt;br/&gt;
console logs since job 225 (May 1st &amp;#8211; just before SR1 was released) until the most&lt;br/&gt;
recent build #327. Some of those jobs were infra aborts, but it&apos;s aprox 90 data points.&lt;br/&gt;
here is a &lt;a href=&quot;https://docs.google.com/spreadsheets/d/1Kby1oVrh-_OTTVFxL2C3HonPof675NMOc-4jp2X2wiU/edit?usp=sharing&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;hacky spreadsheet &lt;/a&gt; to illustrate results.&lt;/p&gt;

&lt;p&gt;There seems to be two points of interest. The job was mostly passing for quite some&lt;br/&gt;
time, then had 10 straight failures before becoming stable again. That happened at&lt;br/&gt;
job #265 (June 6th). Finally job #301 failed again (July 11th) and it seems to be unstable&lt;br/&gt;
since that point on.&lt;/p&gt;

&lt;p&gt;Nothing in the netconf project stands out to me in the June 6th timeframe, but there&lt;br/&gt;
was an MRI bump on &lt;a href=&quot;https://git.opendaylight.org/gerrit/c/netconf/+/91083&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;July 10th &lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here is a &lt;a href=&quot;https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netconf-csit-1node-scale-only-magnesium/326/odl_1/odl1_karaf.log.gz&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;karaf.log &lt;/a&gt; from the most recent failed job. There is some ugliness that&lt;br/&gt;
may be a place to start looking. Search for &quot;Getmulti.Issue_Requests_On_Devices&quot; and&lt;br/&gt;
scroll down. Seems that one mount session went down, then a lot of NETCONF&lt;br/&gt;
operation failed Exceptions.&lt;/p&gt;</description>
                <environment></environment>
        <key id="33121">NETCONF-716</key>
            <summary>regression in netconf-scale CSIT betweeen Mg Sr1 and Sr2</summary>
                <type id="10104" iconUrl="https://jira.opendaylight.org/secure/viewavatar?size=xsmall&amp;avatarId=10303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="1" iconUrl="https://jira.opendaylight.org/images/icons/priorities/blocker.svg">Highest</priority>
                        <status id="10004" iconUrl="https://jira.opendaylight.org/images/icons/status_generic.gif" description="">Verified</status>
                    <statusCategory id="3" key="done" colorName="green"/>
                                    <resolution id="10000">Done</resolution>
                                        <assignee username="jluhrsen">Jamo Luhrsen</assignee>
                                    <reporter username="jluhrsen">Jamo Luhrsen</reporter>
                        <labels>
                    </labels>
                <created>Thu, 30 Jul 2020 22:24:15 +0000</created>
                <updated>Fri, 4 Sep 2020 16:45:57 +0000</updated>
                            <resolved>Fri, 4 Sep 2020 08:25:46 +0000</resolved>
                                    <version>Magnesium SR2</version>
                    <version>Sodium SR4</version>
                                    <fixVersion>Aluminium</fixVersion>
                    <fixVersion>1.13.0</fixVersion>
                    <fixVersion>Magnesium SR3</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                                                                <comments>
                            <comment id="68502" author="rovarga" created="Fri, 31 Jul 2020 08:10:09 +0000"  >&lt;p&gt;So I took a look at&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netconf-csit-1node-scale-only-magnesium/326/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netconf-csit-1node-scale-only-magnesium/326/&lt;/a&gt; (failed) versus &lt;a href=&quot;https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netconf-csit-1node-scale-only-magnesium/299/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netconf-csit-1node-scale-only-magnesium/299/&lt;/a&gt; (successful)&lt;/p&gt;

&lt;p&gt;and one thing that is immediately noticeable is that the logs from the testtool are different: testtool--netconf-scale-txt-GetSingle in the failed case is a lot smaller and lists inability to bind port 17830 and up &#8211; which seems to indicate a failed cleanup before.&lt;/p&gt;

&lt;p&gt;Also the Getmulti log is roughly 39% of the successful run &#8211; where success covers ~38 minutes and failure covers ~19 minutes. I strongly suspect there is something fishy in the test suite.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="68503" author="rovarga" created="Fri, 31 Jul 2020 08:10:56 +0000"  >&lt;p&gt;The other thing is we have a bunch of InterruptedByTimeoutExceptions in SessionTimeoutListener. I think these require client/server heartbeats to be enabled ... which we tracked somewhere else.&lt;/p&gt;</comment>
                            <comment id="68518" author="jluhrsen" created="Tue, 4 Aug 2020 22:43:11 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.opendaylight.org/secure/ViewProfile.jspa?name=rovarga&quot; class=&quot;user-hover&quot; rel=&quot;rovarga&quot;&gt;rovarga&lt;/a&gt; noticed there are a lot of device disconnects in the karaf.log, even in the passing cases. For reference, comparing two logs &lt;a href=&quot;https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netconf-csit-1node-scale-only-magnesium/326/odl_1/odl1_karaf.log.gz&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;1 &lt;/a&gt;&#160; &lt;a href=&quot;https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netconf-csit-1node-scale-only-magnesium/224/odl_1/odl1_karaf.log.gz&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;2&lt;/a&gt; where&lt;br/&gt;
 the first is something recent with this regression and the 2nd is from early May (the SR1 timeframe). The first has 193 instances of &quot;Session went down&quot;&lt;br/&gt;
 in the karaf.log whereas the 2nd has none.&lt;/p&gt;

&lt;p&gt;In mid may, we got the work to &lt;a href=&quot;https://git.opendaylight.org/gerrit/c/netconf/+/89825&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;shade sshd &lt;/a&gt; in order to disassociate with the karaf provided sshd (to fix a bug with karaf shell ssh, iirc). That timeframe&lt;br/&gt;
 doesn&apos;t match up with the description where the failures became prevalent around July 11th. However, there was a &lt;a href=&quot;https://git.opendaylight.org/gerrit/c/netconf/+/89826&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;mina-sshd upgrade &lt;/a&gt; on July 7th&lt;br/&gt;
 that might be a clue.&lt;/p&gt;

&lt;p&gt;I&apos;m experimenting in that area with something like &lt;a href=&quot;https://git.opendaylight.org/gerrit/c/netconf/+/91879/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;this &lt;/a&gt; to see if it gets rid of the regression.&lt;/p&gt;</comment>
                            <comment id="68581" author="rovarga" created="Wed, 26 Aug 2020 09:25:40 +0000"  >&lt;p&gt;Alright, so downgrading mina-sshd seems to have stabilized the test suite, but we still seeing timeouts. I&apos;ll try longevity locally and hopefully will be able to make some sense of this.&lt;/p&gt;</comment>
                            <comment id="68584" author="rovarga" created="Wed, 26 Aug 2020 16:09:11 +0000"  >&lt;p&gt;Okay, the two additional patches seem to stabilize sessions on the use cases tested by the CSIT suite. Let&apos;s bring them in, see if they are restoring sanity.&lt;/p&gt;</comment>
                            <comment id="68585" author="jluhrsen" created="Wed, 26 Aug 2020 16:13:09 +0000"  >&lt;p&gt;I ran two jobs in the sandbox since yesterday on aluminium. one is running with the distribution created from&lt;br/&gt;
the distro check job in the master (silicon) &lt;a href=&quot;https://git.opendaylight.org/gerrit/c/netconf/+/92285&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;patch &lt;/a&gt; that reverted the sshd upgrade&lt;br/&gt;
while the other is running latest aluminium distro which doesn&apos;t have that revert. The first job failed twice in 37 jobs and the&lt;br/&gt;
second failed 7 in 37. I was hoping it would be more concrete with 100% passing in the first job, but it&apos;s still a big difference.&lt;/p&gt;

&lt;p&gt;just fyi.&lt;/p&gt;</comment>
                            <comment id="68586" author="jluhrsen" created="Wed, 26 Aug 2020 16:13:36 +0000"  >&lt;p&gt;can you point me to the two patches?&lt;/p&gt;</comment>
                            <comment id="68587" author="rovarga" created="Wed, 26 Aug 2020 18:28:56 +0000"  >&lt;p&gt;&lt;a href=&quot;https://git.opendaylight.org/gerrit/q/I98ef51b4346c2aee0c9c87d58035bc08ef2c77d1&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/q/I98ef51b4346c2aee0c9c87d58035bc08ef2c77d1&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://git.opendaylight.org/gerrit/q/I2c6ed07b5d3b29f6524b0ed99067895c4dfdc020&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/q/I2c6ed07b5d3b29f6524b0ed99067895c4dfdc020&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;both will be part of next autoreleases.&lt;/p&gt;</comment>
                            <comment id="68588" author="rovarga" created="Wed, 26 Aug 2020 18:31:23 +0000"  >&lt;p&gt;To throw some variance across branches and get some results overnight, &lt;a href=&quot;https://git.opendaylight.org/gerrit/c/netconf/+/92299&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/c/netconf/+/92299&lt;/a&gt; will re-instate sshd-2.5.1 on Silicon and we can compare the results.&lt;/p&gt;</comment>
                            <comment id="68589" author="rovarga" created="Thu, 27 Aug 2020 09:29:40 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jenkins.opendaylight.org/releng/view/netconf/job/netconf-csit-1node-scale-only-silicon/23/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://jenkins.opendaylight.org/releng/view/netconf/job/netconf-csit-1node-scale-only-silicon/23/&lt;/a&gt; succeeded with sshd-2.5.1, the only timeouts are on Karaf SSH. I have gave it another try to see where it is.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jenkins.opendaylight.org/releng/view/netconf/job/netconf-csit-1node-scale-only-aluminium/181/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://jenkins.opendaylight.org/releng/view/netconf/job/netconf-csit-1node-scale-only-aluminium/181/&lt;/a&gt; succeeded with sshd-2.3.0, same as above.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jenkins.opendaylight.org/releng/view/netconf/job/netconf-csit-1node-scale-only-magnesium/355/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://jenkins.opendaylight.org/releng/view/netconf/job/netconf-csit-1node-scale-only-magnesium/355/&lt;/a&gt; seems to contain some failure &#8211; but we&apos;ll need to have it finish before we know more.&lt;/p&gt;

&lt;p&gt;I have triggered a Sodium AR build to get a read on that branch, too.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="68595" author="rovarga" created="Fri, 4 Sep 2020 08:26:27 +0000"  >&lt;p&gt;Everything except Mg is stabilized. I will open a separate issue to track the Mg failure.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10300">
                    <name>Issue split</name>
                                            <outwardlinks description="split to">
                                        <issuelink>
            <issuekey id="33214">INTTEST-123</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                            <customfield id="customfield_11400" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10000" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>0|i03u67:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>