<!-- 
RSS generated by JIRA (8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d) at Wed Feb 07 20:02:04 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>OpenDaylight JIRA</title>
    <link>https://jira.opendaylight.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>8.20.10</version>
        <build-number>820010</build-number>
        <build-date>22-06-2022</build-date>
    </build-info>


<item>
            <title>[INFRAUTILS-29] CSIT hit infrautils.metrics.ThreadsWatcher limit of 1000 threads</title>
                <link>https://jira.opendaylight.org/browse/INFRAUTILS-29</link>
                <project id="10129" key="INFRAUTILS">infrautils</project>
                    <description>&lt;p&gt;&lt;em&gt;&lt;a href=&quot;https://jira.opendaylight.org/secure/ViewProfile.jspa?name=vorburger&quot; class=&quot;user-hover&quot; rel=&quot;vorburger&quot;&gt;vorburger&lt;/a&gt; filing this issue, based on &lt;a href=&quot;https://lists.opendaylight.org/pipermail/infrautils-dev/2018-February/000595.html&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://lists.opendaylight.org/pipermail/infrautils-dev/2018-February/000595.html&lt;/a&gt;, setting &lt;a href=&quot;https://jira.opendaylight.org/secure/ViewProfile.jspa?name=jluhrsen&quot; class=&quot;user-hover&quot; rel=&quot;jluhrsen&quot;&gt;jluhrsen&lt;/a&gt; as Reporter; initially filing in infrautils, but will move to suitable project once we figure where these many threads come from.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Attached is a Thread Dump with &amp;gt;1000 threads in ODL produced by the infrautils.metrics.ThreadsWatcher (new; see &lt;a href=&quot;https://jira.opendaylight.org/browse/INFRAUTILS-22&quot; title=&quot;Automatic threads over threshold detection and logging&quot; class=&quot;issue-link&quot; data-issue-key=&quot;INFRAUTILS-22&quot;&gt;&lt;del&gt;INFRAUTILS-22&lt;/del&gt;&lt;/a&gt;, also FYI &lt;a href=&quot;https://jira.opendaylight.org/browse/INFRAUTILS-21&quot; title=&quot;Automatic deadlock detection logging&quot; class=&quot;issue-link&quot; data-issue-key=&quot;INFRAUTILS-21&quot;&gt;&lt;del&gt;INFRAUTILS-21&lt;/del&gt;&lt;/a&gt;) which I extracted out of the 1.1 GB odl1_karaf.log.1 from &lt;a href=&quot;https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/builder-copy-sandbox-logs/19/jamo-netvirt-csit-1node-openstack-pike-upstream-stateful-oxygen/4/odl_1/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/builder-copy-sandbox-logs/19/jamo-netvirt-csit-1node-openstack-pike-upstream-stateful-oxygen/4/odl_1/&lt;/a&gt; ... &lt;/p&gt;

&lt;p&gt;Why are there some many threads? What created them?&lt;/p&gt;</description>
                <environment></environment>
        <key id="29301">INFRAUTILS-29</key>
            <summary>CSIT hit infrautils.metrics.ThreadsWatcher limit of 1000 threads</summary>
                <type id="10104" iconUrl="https://jira.opendaylight.org/secure/viewavatar?size=xsmall&amp;avatarId=10303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="1" iconUrl="https://jira.opendaylight.org/images/icons/priorities/blocker.svg">Highest</priority>
                        <status id="5" iconUrl="https://jira.opendaylight.org/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="green"/>
                                    <resolution id="10000">Done</resolution>
                                        <assignee username="vorburger">Michael Vorburger</assignee>
                                    <reporter username="jluhrsen">Jamo Luhrsen</reporter>
                        <labels>
                            <label>patch_merged</label>
                    </labels>
                <created>Wed, 28 Feb 2018 10:05:28 +0000</created>
                <updated>Fri, 2 Mar 2018 15:39:48 +0000</updated>
                            <resolved>Fri, 2 Mar 2018 15:39:48 +0000</resolved>
                                    <version>Oxygen</version>
                    <version>Fluorine</version>
                                    <fixVersion>Oxygen</fixVersion>
                    <fixVersion>Fluorine</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                                                                <comments>
                            <comment id="61336" author="vorburger" created="Wed, 28 Feb 2018 10:32:35 +0000"  >&lt;p&gt;Whoa, hang on a sec, what are all these threads in the attached essentially &lt;tt&gt;jstack&lt;/tt&gt; equivalent:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;    &quot;Karaf ssh console user karaf&quot; id=... state=TIMED_WAITING&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;let&apos;s count them:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;grep &quot;Karaf ssh&quot; infrautils.metrics.ThreadsWatcher.txt | wc -l
430&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;There are 1000 - 430 = 570 other threads we could look more into as well, but this jumps as as the biggest immediate problem... That 430 is a lot of threads for 1 Karaf SSH session, no? &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.opendaylight.org/images/icons/emoticons/wink.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt; &lt;a href=&quot;https://jira.opendaylight.org/secure/ViewProfile.jspa?name=jluhrsen&quot; class=&quot;user-hover&quot; rel=&quot;jluhrsen&quot;&gt;jluhrsen&lt;/a&gt; perhaps there isn&apos;t just 1 SSH session to Karaf, in CSIT... could that be? I don&apos;t suppose there could be some sort of issue more on the CSIT script side than in ODL / Karaf here? Like you guys aren&apos;t by any chance... I&apos;m not sure, like sending hundreds of ODL shell commands to Karaf from the CSIT robot thing and doing that over a new ssh connection each time and for some reason not closing the previous SSH session - kind of thing? Even if you are not intentionally doing it like that, maybe some bug in.. Robot, or its SSH library?&lt;/p&gt;</comment>
                            <comment id="61344" author="shague@redhat.com" created="Wed, 28 Feb 2018 13:24:27 +0000"  >&lt;p&gt;Every suite and test is logged when they start. ROBOT Message: Starting suite blah, Starting test foo... So you would see that at last 240 for that. Each test also has a mdsal dump for each test, so another 235 or so. That could be the issue. Those commands were supposed to use a single connection to ODL but maybe something is busted there.&lt;/p&gt;

&lt;p&gt;Any ideas how to debug this - meaning to tell if on the odl side the sessions are open? We have journalctl logs on the odl nodes and I see many ssh connections opened, but most are followed with a a close. Thee are only 30 so I think the karaf ssh sessions must not be hitting there.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-1node-openstack-ocata-upstream-stateful-oxygen/965/odl_1/journalctl.log.gz&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-1node-openstack-ocata-upstream-stateful-oxygen/965/odl_1/journalctl.log.gz&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="61345" author="vorburger" created="Wed, 28 Feb 2018 13:55:35 +0000"  >&lt;p&gt;&amp;gt; you would see that at last 240 for that. Each test also (...), so another 235 or so&lt;/p&gt;

&lt;p&gt;that adds up to 400 something, roughly matching, so yeah that&apos;s probably it then...&lt;/p&gt;

&lt;p&gt;&amp;gt; Those commands were supposed to use a single connection to ODL&lt;/p&gt;

&lt;p&gt;what exactly do those &quot;commands&quot; technically do? separate SSH or (probably) new SSH for each?&lt;/p&gt;

&lt;p&gt;&amp;gt; Any ideas how to debug this - meaning to tell if on the odl side the sessions are open?&lt;/p&gt;

&lt;p&gt;dunno how to easily check for open ports inside the JVM, but &lt;tt&gt;jstack&lt;/tt&gt; and see those threads?&lt;/p&gt;

&lt;p&gt;&amp;gt; journalctl logs on the odl nodes and I see many ssh connections opened, but most are followed with a a close&lt;/p&gt;

&lt;p&gt;not sure what journalctl shows here - is that really the connection of the CSIT scripts into Karaf? Isn&apos;t that showing OS system ssh logins? I guess we could somehow check the TCP socket for the SSH sessions to Karaf is closed? But maybe the problem isn&apos;t at that level - I mean it COULD also be a bug not in CSIT ssh client connections, but in Karaf&apos;s built-in ssh server itself... like perhaps it&apos;s dumb enough to keep a loose open thread for each session even after the SSH client disconnects? That would clearly cause a thread leak, over time, and be a &quot;Karaf&quot; bug (in quotes because this more likely would be a problem in whatever library does the SSH server for Karaf, not even Karaf itself).&lt;/p&gt;</comment>
                            <comment id="61355" author="vorburger" created="Wed, 28 Feb 2018 17:16:28 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.opendaylight.org/secure/ViewProfile.jspa?name=jluhrsen&quot; class=&quot;user-hover&quot; rel=&quot;jluhrsen&quot;&gt;jluhrsen&lt;/a&gt; FYI I was looking a bit more into this with &lt;a href=&quot;https://jira.opendaylight.org/secure/ViewProfile.jspa?name=shague&quot; class=&quot;user-hover&quot; rel=&quot;shague&quot;&gt;shague&lt;/a&gt; ... here are some new conclusions:&lt;/p&gt;

&lt;p&gt;1. according to &lt;a href=&quot;https://jira.opendaylight.org/secure/ViewProfile.jspa?name=shague&quot; class=&quot;user-hover&quot; rel=&quot;shague&quot;&gt;shague&lt;/a&gt;, when you wrote on &lt;a href=&quot;https://lists.opendaylight.org/pipermail/infrautils-dev/2018-February/000604.html&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://lists.opendaylight.org/pipermail/infrautils-dev/2018-February/000604.html&lt;/a&gt; that &quot;&lt;span class=&quot;error&quot;&gt;&amp;#91;b&amp;#93;&lt;/span&gt; had the same trouble and that was without odl-mdsal-trace&quot; that really meant &quot;without feature:instal odl-mdsal-trace&quot; but still WITH &lt;a href=&quot;https://git.opendaylight.org/gerrit/#/c/68447/6/csit/libraries/OpenStackOperations.robot&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/#/c/68447/6/csit/libraries/OpenStackOperations.robot&lt;/a&gt;, right?&lt;/p&gt;

&lt;p&gt;2. &lt;a href=&quot;https://jira.opendaylight.org/secure/ViewProfile.jspa?name=shague&quot; class=&quot;user-hover&quot; rel=&quot;shague&quot;&gt;shague&lt;/a&gt; &lt;em&gt;&quot;ran sandbox with c/68447 removed and it &lt;a href=&quot;https://logs.opendaylight.org/sandbox/vex-yul-odl-jenkins-2/netvirt-csit-1node-openstack-ocata-shague-upstream-stateful-oxygen/1/odl_1/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;looked good for logs&lt;/a&gt;&quot;&lt;/em&gt;. That confirms that what caused these thread leaks to suddenly pop up, with absolutely no code having changed anywhere otherwise, is c/68447.&lt;/p&gt;

&lt;p&gt;3. my gut now tells me this is more likely a problem in CSIT robot scripts, than in Karaf&apos;s SSH server (first suspicion, above). Is it at all possible that, somehow, that &lt;tt&gt;Issue_Command_On_Karaf_Console&lt;/tt&gt; thing somehow lets hundreds of SSH sessions hanging open?  I know nothing about that stuff, but e.g. seeing that in &lt;tt&gt;OpenStackOperations.robot&lt;/tt&gt; there is a &lt;tt&gt;SSHLibrary.Close All Connections&lt;/tt&gt; in line 1043 after one FOR &lt;tt&gt;Issue_Command_On_Karaf_Console    trace:transaction&lt;/tt&gt; but in line 560 there isn&apos;t - could that be a problem?&lt;/p&gt;</comment>
                            <comment id="61357" author="jluhrsen" created="Wed, 28 Feb 2018 22:50:41 +0000"  >&lt;p&gt;I sent an &lt;a href=&quot;https://lists.opendaylight.org/pipermail/integration-dev/2018-February/011190.html&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;email&lt;/a&gt; with some more details and observations. This is not anything to do with a bug in CSIT or dependant library. We can get to the same state with a bash script doing &quot;ssh&quot; in a loop (properly opening/closing etc).&lt;/p&gt;

&lt;p&gt;Anyway, the gist of it is that running karaf shell commands is opening some threads and they remain open for 60s. If we&#160;do enough of these at a fast enough rate, we&apos;ll cross this 1k threshold.&lt;/p&gt;

&lt;p&gt;I&apos;m not sure what state we are in (thread count wise) in our CSIT environment, but I suppose we have to assume we are somewhere near that threshold to start with. I don&apos;t think CSIT is doing more than a handful of these things per second (if that. more like 1-2 every 10-300 seconds).&lt;/p&gt;</comment>
                            <comment id="61364" author="vorburger" created="Thu, 1 Mar 2018 09:53:49 +0000"  >&lt;p&gt;&amp;gt; get to the same state with a bash script doing &quot;ssh&quot; in a loop (properly opening/closing etc).&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.opendaylight.org/secure/ViewProfile.jspa?name=jluhrsen&quot; class=&quot;user-hover&quot; rel=&quot;jluhrsen&quot;&gt;jluhrsen&lt;/a&gt; if you have it handy and wouldn&apos;t mind attaching that here, it could be useful for repro. in the future.&lt;/p&gt;</comment>
                            <comment id="61371" author="skitt@redhat.com" created="Thu, 1 Mar 2018 14:21:09 +0000"  >&lt;p&gt;BTW Karaf has a built-in &#8220;threads&#8221; command which outputs the thread dump, or a formatted list of threads with the &#8220;&#8211;list&#8221; option; you can combine that with &#8220;| wc -l&#8221; to get a quick measure of the number of running threads.&lt;/p&gt;</comment>
                            <comment id="61382" author="klou" created="Thu, 1 Mar 2018 20:11:43 +0000"  >&lt;p&gt;This is the patch&#160; ready to be merged:&#160;&lt;a href=&quot;https://git.opendaylight.org/gerrit/#/c/68937/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/#/c/68937/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="61383" author="jluhrsen" created="Thu, 1 Mar 2018 22:23:30 +0000"  >&lt;blockquote&gt;&lt;p&gt;&amp;gt; get to the same state with a bash script doing &quot;ssh&quot; in a loop (properly opening/closing etc).&lt;/p&gt;

&lt;p&gt;Jamo Luhrsen if you have it handy and wouldn&apos;t mind attaching that here, it could be useful for repro. in the future.&lt;/p&gt;&lt;/blockquote&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;&lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; i in {1..5000}; &lt;span class=&quot;code-keyword&quot;&gt;do&lt;/span&gt; sshpass -p karaf ssh -p 8101 karaf@localhost -o StrictHostKeyChecking=no feature:list -i ; done&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="61384" author="jluhrsen" created="Fri, 2 Mar 2018 01:17:06 +0000"  >&lt;blockquote&gt;&lt;p&gt;BTW Karaf has a built-in &#8220;threads&#8221; command which outputs the thread dump, or a formatted list of threads with the &#8220;&#8211;list&#8221; option; you can combine that with &#8220;| wc -l&#8221; to get a quick measure of the number of running threads.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;cool, this can be added to CSIT at some point if we think it&apos;s valuable. The fix to calm down the logging&lt;br/&gt;
is merged, so I think we are safe to add something like this. Here&apos;s &lt;a href=&quot;https://git.opendaylight.org/gerrit/#/c/68982/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;something &lt;/a&gt; along those lines. &lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="14438" name="infrautils.metrics.ThreadsWatcher.txt" size="1660770" author="vorburger" created="Wed, 28 Feb 2018 10:05:22 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                            <customfield id="customfield_11400" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10000" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>0|i03ax3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>