<!-- 
RSS generated by JIRA (8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d) at Wed Feb 07 19:56:25 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>OpenDaylight JIRA</title>
    <link>https://jira.opendaylight.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>8.20.10</version>
        <build-number>820010</build-number>
        <build-date>22-06-2022</build-date>
    </build-info>


<item>
            <title>[CONTROLLER-1772] Clustering Issues Error writing to datastore  and AskTimeoutException</title>
                <link>https://jira.opendaylight.org/browse/CONTROLLER-1772</link>
                <project id="10113" key="CONTROLLER">controller</project>
                    <description>&lt;p&gt;Description of problem:&lt;br/&gt;
 Let me start with describing our setup:&lt;br/&gt;
 3 OpenStack controllers&lt;br/&gt;
 3 ODLs clustered&lt;br/&gt;
 3 Compute nodes&lt;/p&gt;

&lt;p&gt;When running a Browbeat+Rally Scenario which does:&lt;br/&gt;
 1. Create two networks&lt;br/&gt;
 2. Create two subnets&lt;br/&gt;
 3. Create router&lt;br/&gt;
 4. Attach each of these subnets to the router&lt;/p&gt;

&lt;p&gt;for a total 500 times at varying concurrencies of 8,16, and 32 we see journal entries piling up in opendaylightjournal table in the ovs_neutron DB. On inspecting, the neutron-server logs ton of HTTP client errors and timeouts are seen. For example, tracing the port f9f307c4-bcc7-48c2-863f-2cbc8b0b8e73&lt;/p&gt;

&lt;p&gt;We see&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2017-09-14 07:50:08.593 202374 ERROR networking_odl.journal.journal [req-cc3661d4-fb99-47bc-bb85-f64b4d1a496d - - - - -] Error while processing update port f9f307c4-bcc7-48c2-863f-2cbc8b0b8e73: ReadTimeout: HTTPConnectionPool(host=&apos;172.16.0.17&apos;, port=8081): Read timed out. (read timeout=10)
 2017-09-14 07:50:08.593 202374 ERROR networking_odl.journal.journal Traceback (most recent call last):
 2017-09-14 07:50:08.593 202374 ERROR networking_odl.journal.journal File &quot;/usr/lib/python2.7/site-packages/networking_odl/journal/journal.py&quot;, line 245, in _sync_entry
 2017-09-14 07:50:08.593 202374 ERROR networking_odl.journal.journal self.client.sendjson(method, urlpath, to_send)
 2017-09-14 07:50:08.593 202374 ERROR networking_odl.journal.journal File &quot;/usr/lib/python2.7/site-packages/networking_odl/common/client.py&quot;, line 106, in sendjson
 2017-09-14 07:50:08.593 202374 ERROR networking_odl.journal.journal &apos;body&apos;: obj})
 2017-09-14 07:50:08.593 202374 ERROR networking_odl.journal.journal File &quot;/usr/lib/python2.7/site-packages/oslo_utils/excutils.py&quot;, line 220, in _exit_
 2017-09-14 07:50:08.593 202374 ERROR networking_odl.journal.journal self.force_reraise()
 2017-09-14 07:50:08.593 202374 ERROR networking_odl.journal.journal File &quot;/usr/lib/python2.7/site-packages/oslo_utils/excutils.py&quot;, line 196, in force_reraise
 2017-09-14 07:50:08.593 202374 ERROR networking_odl.journal.journal six.reraise(self.type_, self.value, self.tb)
 2017-09-14 07:50:08.593 202374 ERROR networking_odl.journal.journal File &quot;/usr/lib/python2.7/site-packages/networking_odl/common/client.py&quot;, line 98, in sendjson
 2017-09-14 07:50:08.593 202374 ERROR networking_odl.journal.journal self.request(method, urlpath, data))
 2017-09-14 07:50:08.593 202374 ERROR networking_odl.journal.journal File &quot;/usr/lib/python2.7/site-packages/networking_odl/common/client.py&quot;, line 91, in request
 2017-09-14 07:50:08.593 202374 ERROR networking_odl.journal.journal method, url=url, headers=headers, data=data, timeout=self.timeout)
 2017-09-14 07:50:08.593 202374 ERROR networking_odl.journal.journal File &quot;/usr/lib/python2.7/site-packages/requests/sessions.py&quot;, line 475, in request
 2017-09-14 07:50:08.593 202374 ERROR networking_odl.journal.journal resp = self.send(prep, **send_kwargs)
 2017-09-14 07:50:08.593 202374 ERROR networking_odl.journal.journal File &quot;/usr/lib/python2.7/site-packages/requests/sessions.py&quot;, line 596, in send
 2017-09-14 07:50:08.593 202374 ERROR networking_odl.journal.journal r = adapter.send(request, **kwargs)
 2017-09-14 07:50:08.593 202374 ERROR networking_odl.journal.journal File &quot;/usr/lib/python2.7/site-packages/requests/adapters.py&quot;, line 499, in send
 2017-09-14 07:50:08.593 202374 ERROR networking_odl.journal.journal raise ReadTimeout(e, request=request)
 2017-09-14 07:50:08.593 202374 ERROR networking_odl.journal.journal ReadTimeout: HTTPConnectionPool(host=&apos;172.16.0.17&apos;, port=8081): Read timed out. (read timeout=10)
 2017-09-14 07:50:08.593 202374 ERROR networking_odl.journal.journal&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;It looks like networking-odl isn&apos;t able to connect to ODL and on looking at karaf logs for this port-id we see on controller-1, we see&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;1894842:2017-09-14 04:18:43,228 | ERROR | ChangeHandler-16 | MDSALUtil | 319 - org.opendaylight.genius.mdsalutil-api - 0.2.2.SNAPSHOT | Error writing to datastore (path, data) : (KeyedInstanceIdentifier
{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.netvirt.neutronvpn.rev150602.router.interfaces.map.router.interfaces.Interfaces, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.netvirt.neutronvpn.rev150602.RouterInterfacesMap, org.opendaylight.yang.gen.v1.urn.opendaylight.netvirt.neutronvpn.rev150602.router.interfaces.map.RouterInterfaces[key=RouterInterfacesKey [_routerId=Uuid [_value=4d78c0f1-1db8-42e1-bde8-c48cad4bc351]]], org.opendaylight.yang.gen.v1.urn.opendaylight.netvirt.neutronvpn.rev150602.router.interfaces.map.router.interfaces.Interfaces[key=InterfacesKey [_interfaceId=f9f307c4-bcc7-48c2-863f-2cbc8b0b8e73]]]}
, Interfaces [_interfaceId=f9f307c4-bcc7-48c2-863f-2cbc8b0b8e73, _key=InterfacesKey [_interfaceId=f9f307c4-bcc7-48c2-863f-2cbc8b0b8e73], augmentation=[]])
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Just above this message we also see several ASkTimeoutExceptions as following:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2017-09-14 04:18:13,208 | ERROR | t-dispatcher-134 | LocalThreePhaseCommitCohort | 211 - org.opendaylight.controller.sal-distributed-datastore - 1.5.2.SNAPSHOT | Failed to prepare transaction member-1-datastore-config-fe-0-txn-175117-0 on backend
 akka.pattern.AskTimeoutException: Ask timed out on ActorSelection[Anchor(akka://opendaylight-cluster-data/), Path(/user/shardmanager-config/member-1-shard-default-config#1400650520)] after [30000 ms]. Sender[null] sent message of type &quot;org.opendaylight.controller.cluster.datastore.messages.ReadyLocalTransaction&quot;.
 at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)[185:com.typesafe.akka.actor:2.4.18]
 at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)[185:com.typesafe.akka.actor:2.4.18]
 at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)[181:org.scala-lang.scala-library:2.11.11.v20170413-090219-8a413ba7cc]
 at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)[181:org.scala-lang.scala-library:2.11.11.v20170413-090219-8a413ba7cc]
 at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)[181:org.scala-lang.scala-library:2.11.11.v20170413-090219-8a413ba7cc]
 at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)[185:com.typesafe.akka.actor:2.4.18]
 at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)[185:com.typesafe.akka.actor:2.4.18]
 at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)[185:com.typesafe.akka.actor:2.4.18]
 at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)[185:com.typesafe.akka.actor:2.4.18]
 at java.lang.Thread.run(Thread.java:748)[:1.8.0_141]&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Due to the state of the ODL cluster, networking-odl has trouble connecting and creating resources, leading to high mysqld usage (because of all the rows piled up) and also very high CPU usage of the karaf process (around 30 cores)&lt;/p&gt;

&lt;p&gt;We are suspecting the issues with ODL as possible reasons for mysqld high CPU usage because, we were testing networking-odl with some unmerged patches that deleted a row as soon as it was completed, leading to better performance and low CPU usage due to lower rows to scan.&lt;/p&gt;

&lt;p&gt;Version-Release number of selected component (if applicable):&lt;br/&gt;
 OpenStack Pike (12)&lt;br/&gt;
 opendaylight-6.2.0-0.1.20170913snap58.el7.noarch&lt;br/&gt;
 python-networking-odl-11.0.0-0.20170806093629.2e78dca.el7ost.noarch&lt;/p&gt;

&lt;p&gt;How reproducible:&lt;br/&gt;
 Quite consistent&lt;/p&gt;

&lt;p&gt;Steps to Reproduce:&lt;br/&gt;
 1. Deploy OpenStack with ODL as backend&lt;br/&gt;
 2. Run Perf/Scale tests of creating tons of neutron resources&lt;br/&gt;
 3.&lt;/p&gt;

&lt;p&gt;Actual results:&lt;br/&gt;
 Resources aren&apos;t being created in ODL, they keep piling up in opendaylightjournal table due to HTTP connection errors with ODL (default timeout is 10s for the HTTP connection)&lt;/p&gt;

&lt;p&gt;Expected results:&lt;br/&gt;
 No exceptions in logs and resources are created&lt;/p&gt;

&lt;p&gt;Additional info:&lt;/p&gt;

&lt;p&gt;Logs of karaf on controller-0,1 and 2&lt;br/&gt;
 Logs of neutron-server on controller-0&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://8.43.86.1:8088/smalleni/clustering-logs.tar.gz&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://8.43.86.1:8088/smalleni/clustering-logs.tar.gz&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Port id that can be used to track problem: f9f307c4-bcc7-48c2-863f-2cbc8b0b8e73&lt;/p&gt;</description>
                <environment>&lt;p&gt;Operating System: All&lt;br/&gt;
Platform: All&lt;/p&gt;</environment>
        <key id="26326">CONTROLLER-1772</key>
            <summary>Clustering Issues Error writing to datastore  and AskTimeoutException</summary>
                <type id="10104" iconUrl="https://jira.opendaylight.org/secure/viewavatar?size=xsmall&amp;avatarId=10303&amp;avatarType=issuetype">Bug</type>
                                                <status id="5" iconUrl="https://jira.opendaylight.org/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="green"/>
                                    <resolution id="10003">Cannot Reproduce</resolution>
                                        <assignee username="oleksii.mozghovyi">Oleksii Mozghovyi</assignee>
                                    <reporter username="smalleni@redhat.com">Sai Sindhur Malleni</reporter>
                        <labels>
                    </labels>
                <created>Sat, 16 Sep 2017 02:20:02 +0000</created>
                <updated>Wed, 2 Mar 2022 17:59:11 +0000</updated>
                            <resolved>Fri, 2 Jul 2021 12:57:40 +0000</resolved>
                                    <version>Carbon</version>
                                                    <component>clustering</component>
                        <due></due>
                            <votes>0</votes>
                                    <watches>2</watches>
                                                                                                                <comments>
                            <comment id="69037" author="JIRAUSER12941" created="Wed, 7 Apr 2021 12:43:45 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.opendaylight.org/secure/ViewProfile.jspa?name=smalleni%40redhat.com&quot; class=&quot;user-hover&quot; rel=&quot;smalleni@redhat.com&quot;&gt;smalleni@redhat.com&lt;/a&gt;, is this problem still happening on integration with the newer OpenDaylight releases? Could you please point me to the logs(the ticket is pretty old, and the log files link doesn&apos;t work anymore)?&lt;/p&gt;</comment>
                            <comment id="69038" author="JIRAUSER12941" created="Wed, 7 Apr 2021 12:49:11 +0000"  >&lt;p&gt;One more question about the test environment - what was the amount of CPU/RAM allocated for the ODL cluster nodes?&lt;/p&gt;</comment>
                            <comment id="69235" author="JIRAUSER12941" created="Wed, 26 May 2021 14:53:52 +0000"  >&lt;p&gt;I don&apos;t think we can reproduce this kind of issue in our dev/lfn only environment, so maybe it would be better to close it.&lt;/p&gt;</comment>
                            <comment id="70613" author="JIRAUSER14501" created="Wed, 2 Mar 2022 17:59:11 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.opendaylight.org/secure/ViewProfile.jspa?name=oleksii.mozghovyi&quot; class=&quot;user-hover&quot; rel=&quot;oleksii.mozghovyi&quot;&gt;oleksii.mozghovyi&lt;/a&gt;&#160;Seeing the AskTimeOutException in the Current release also (Phosphorus SR2 also) -&#160;&lt;a href=&quot;https://jira.opendaylight.org/browse/CONTROLLER-2032&quot; title=&quot;Phosphorus SR1 - AskTimeoutException while trying to mount multiple Netconf devices at a time&quot; class=&quot;issue-link&quot; data-issue-key=&quot;CONTROLLER-2032&quot;&gt;CONTROLLER-2032&lt;/a&gt; Phosphorus SR1 - AskTimeoutException while trying to multiple Netconf devices at a time - OpenDaylight JIRA&lt;/p&gt;

&lt;p&gt;Scenario: 3 Node Cluster, and tried to mount 250 Netconf devices at the same time, the POST requests are processed successfully, but when the real mouting happens, after 40-50 device mount, AskTimeout error occurs and further mounts are stuck.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                            <customfield id="customfield_11400" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10208" key="com.atlassian.jira.plugin.system.customfieldtypes:textfield">
                        <customfieldname>External issue ID</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9173</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10201" key="com.atlassian.jira.plugin.system.customfieldtypes:url">
                        <customfieldname>External issue URL</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[https://bugs.opendaylight.org/show_bug.cgi?id=9173]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10000" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>0|i02smv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>