<!-- 
RSS generated by JIRA (8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d) at Wed Feb 07 19:56:13 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>OpenDaylight JIRA</title>
    <link>https://jira.opendaylight.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>8.20.10</version>
        <build-number>820010</build-number>
        <build-date>22-06-2022</build-date>
    </build-info>


<item>
            <title>[CONTROLLER-1697] entity-owners read too slow: Transport failure detector triggered.</title>
                <link>https://jira.opendaylight.org/browse/CONTROLLER-1697</link>
                <project id="10113" key="CONTROLLER">controller</project>
                    <description>&lt;p&gt;This is a Carbon-SR1 bug, in the sense that it has been seen on a build based on a Change &lt;span class=&quot;error&quot;&gt;&amp;#91;0&amp;#93;&lt;/span&gt; which has +2 but will not be merged into Carbon Release.&lt;/p&gt;

&lt;p&gt;This is similar to &lt;a href=&quot;https://jira.opendaylight.org/browse/CONTROLLER-1672&quot; title=&quot;Restconf slow to respond when the member is under load&quot; class=&quot;issue-link&quot; data-issue-key=&quot;CONTROLLER-1672&quot;&gt;&lt;del&gt;CONTROLLER-1672&lt;/del&gt;&lt;/a&gt; but some symptoms are different.&lt;br/&gt;
From Robot point of view, this is still just a restconf read &lt;span class=&quot;error&quot;&gt;&amp;#91;1&amp;#93;&lt;/span&gt; not finishing in time (30 seconds). This time we are reading entity-owners (module-based shard), as we are testing Singleton Service (tell-based protocol) by repeatedly isolating the member with active instance.&lt;/p&gt;

&lt;p&gt;The test was run with debug log verbosity on some loggers. Member-1 karaf log is here &lt;span class=&quot;error&quot;&gt;&amp;#91;2&amp;#93;&lt;/span&gt;, but it is from Sandbox, so it will be deleted over the weekend.&lt;/p&gt;

&lt;p&gt;The first iteration has passed.&lt;br/&gt;
At its start, member-1 client has connected to member-2 backend:&lt;br/&gt;
2017-05-24 14:13:34,374 | INFO  | lt-dispatcher-20 | ClientActorBehavior              | 197 - org.opendaylight.controller.cds-access-client - 1.1.0.SNAPSHOT | member-1-frontend-datastore-operational: replaced connection ConnectingClientConnection{client=ClientIdentifier&lt;/p&gt;
{frontend=member-1-frontend-datastore-operational, generation=0}, cookie=1} with ConnectedClientConnection{client=ClientIdentifier{frontend=member-1-frontend-datastore-operational, generation=0}
&lt;p&gt;, cookie=1, backend=ShardBackendInfo{actor=Actor&lt;span class=&quot;error&quot;&gt;&amp;#91;akka.tcp://opendaylight-cluster-data@10.29.14.172:2550/user/shardmanager-operational/member-2-shard-entity-ownership-operational#-605622033&amp;#93;&lt;/span&gt;, sessionId=1, version=BORON, maxMessages=1000, cookie=1, shard=entity-ownership, dataTree present=false}}&lt;/p&gt;

&lt;p&gt;Member-1 has been isolated in the first iteration. After rejoin, the test spend 1 minute making sure responses from the active singleton instance are not changing.&lt;/p&gt;

&lt;p&gt;The next read from entity ownership is at the start of the second iteration. The log shows member-1 client detected a mismatch:&lt;br/&gt;
2017-05-24 14:16:02,976 | INFO  | ult-dispatcher-3 | ShardManager                     | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | shard-manager-operational Received follower initial sync status for member-1-shard-entity-ownership-operational status sync done true&lt;br/&gt;
2017-05-24 14:17:09,242 | DEBUG | qtp841096586-310 | AbstractClientHistory            | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | org.opendaylight.controller.cluster.databroker.actors.dds.SingleClientHistory@6bed1f87: creating a new snapshot member-1-datastore-operational-fe-0-txn-1-0&lt;br/&gt;
2017-05-24 14:17:09,242 | DEBUG | qtp841096586-310 | ProxyHistory                     | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Allocated proxy member-1-datastore-operational-fe-0-txn-1-1 for transaction member-1-datastore-operational-fe-0-txn-1-0&lt;br/&gt;
2017-05-24 14:17:09,242 | DEBUG | qtp841096586-310 | AbstractProxyTransaction         | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Transaction RemoteProxyTransaction&lt;/p&gt;
{identifier=member-1-datastore-operational-fe-0-txn-1-1, state=open} allocated sequence 0&lt;br/&gt;
2017-05-24 14:17:09,243 | DEBUG | qtp841096586-310 | AbstractProxyTransaction         | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Transaction proxy RemoteProxyTransaction{identifier=member-1-datastore-operational-fe-0-txn-1-1, state=open}
&lt;p&gt; sending request ReadTransactionRequest&lt;/p&gt;
{target=member-1-datastore-operational-fe-0-txn-1-1, sequence=0, replyTo=Actor[akka://opendaylight-cluster-data/user/$b#-460505189], snapshotOnly=true, path=/(urn:opendaylight:params:xml:ns:yang:controller:md:sal:clustering:entity-owners?revision=2015-08-04)entity-owners}
&lt;p&gt; callback org.opendaylight.controller.cluster.databroker.actors.dds.RemoteProxyTransaction$$Lambda$165/912139580@3c26e7ca&lt;br/&gt;
2017-05-24 14:17:09,243 | DEBUG | qtp841096586-310 | AbstractClientConnection         | 197 - org.opendaylight.controller.cds-access-client - 1.1.0.SNAPSHOT | member-1-frontend-datastore-operational: scheduling timeout in 30 seconds&lt;br/&gt;
2017-05-24 14:17:09,243 | DEBUG | qtp841096586-310 | TransmitQueue                    | 197 - org.opendaylight.controller.cds-access-client - 1.1.0.SNAPSHOT | Queue ConnectionEntry{request=ReadTransactionRequest&lt;/p&gt;
{target=member-1-datastore-operational-fe-0-txn-1-1, sequence=0, replyTo=Actor[akka://opendaylight-cluster-data/user/$b#-460505189], snapshotOnly=true, path=/(urn:opendaylight:params:xml:ns:yang:controller:md:sal:clustering:entity-owners?revision=2015-08-04)entity-owners}
&lt;p&gt;, enqueuedTicks=684758707440} transmitting entry {}&lt;br/&gt;
2017-05-24 14:17:09,267 | INFO  | ult-dispatcher-3 | ClientActorBehavior              | 197 - org.opendaylight.controller.cds-access-client - 1.1.0.SNAPSHOT | member-1-frontend-datastore-operational: connection ConnectedClientConnection{client=ClientIdentifier&lt;/p&gt;
{frontend=member-1-frontend-datastore-operational, generation=0}, cookie=1, backend=ShardBackendInfo{actor=Actor&lt;span class=&quot;error&quot;&gt;&amp;#91;akka.tcp://opendaylight-cluster-data@10.29.14.172:2550/user/shardmanager-operational/member-2-shard-entity-ownership-operational#-605622033&amp;#93;&lt;/span&gt;, sessionId=1, version=BORON, maxMessages=1000, cookie=1, shard=entity-ownership, dataTree present=false}} indicated no sequencing mismatch on member-1-datastore-operational-fe-0-txn-1-1 sequence 0, reconnecting it&lt;br/&gt;
org.opendaylight.controller.cluster.access.commands.OutOfSequenceEnvelopeException: Expecting envelope 0&lt;br/&gt;
&lt;br/&gt;
and after some replays reconnected to member-2 backend (again):&lt;br/&gt;
2017-05-24 14:17:09,279 | INFO  | ult-dispatcher-3 | ClientActorBehavior              | 197 - org.opendaylight.controller.cds-access-client - 1.1.0.SNAPSHOT | member-1-frontend-datastore-operational: replaced connection ReconnectingClientConnection{client=ClientIdentifier{frontend=member-1-frontend-datastore-operational, generation=0}
&lt;p&gt;, cookie=1, backend=ShardBackendInfo{actor=Actor&lt;span class=&quot;error&quot;&gt;&amp;#91;akka.tcp://opendaylight-cluster-data@10.29.14.172:2550/user/shardmanager-operational/member-2-shard-entity-ownership-operational#-605622033&amp;#93;&lt;/span&gt;, sessionId=1, version=BORON, maxMessages=1000, cookie=1, shard=entity-ownership, dataTree present=false}} with ConnectedClientConnection{client=ClientIdentifier&lt;/p&gt;
{frontend=member-1-frontend-datastore-operational, generation=0}
&lt;p&gt;, cookie=1, backend=ShardBackendInfo{actor=Actor&lt;span class=&quot;error&quot;&gt;&amp;#91;akka.tcp://opendaylight-cluster-data@10.29.14.172:2550/user/shardmanager-operational/member-2-shard-entity-ownership-operational#-605622033&amp;#93;&lt;/span&gt;, sessionId=2, version=BORON, maxMessages=1000, cookie=1, shard=entity-ownership, dataTree present=false}}&lt;/p&gt;

&lt;p&gt;In the second iteration, active instance was on member-2, so it got isolated, and after wait for new leader (member-1 got elected), member-1 was asked about the new active instance. This is the entity-owner read that got stuck for at least 30 seconds. Here is the complete end of karaf.log:&lt;br/&gt;
2017-05-24 14:17:20,110 | INFO  | lt-dispatcher-31 | ShardManager                     | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | shard-manager-operational: Received role changed for member-1-shard-entity-ownership-operational from Candidate to Leader&lt;br/&gt;
2017-05-24 14:17:20,110 | INFO  | lt-dispatcher-31 | ShardManager                     | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | shard-manager-operational: All Shards are ready - data store operational is ready, available count is 0&lt;br/&gt;
2017-05-24 14:17:21,558 | DEBUG | qtp841096586-310 | AbstractClientHistory            | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | org.opendaylight.controller.cluster.databroker.actors.dds.SingleClientHistory@6bed1f87: creating a new snapshot member-1-datastore-operational-fe-0-txn-2-0&lt;br/&gt;
2017-05-24 14:17:21,558 | DEBUG | qtp841096586-310 | ProxyHistory                     | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Allocated proxy member-1-datastore-operational-fe-0-txn-2-1 for transaction member-1-datastore-operational-fe-0-txn-2-0&lt;br/&gt;
2017-05-24 14:17:21,558 | DEBUG | qtp841096586-310 | AbstractProxyTransaction         | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Transaction RemoteProxyTransaction&lt;/p&gt;
{identifier=member-1-datastore-operational-fe-0-txn-2-1, state=open} allocated sequence 0&lt;br/&gt;
2017-05-24 14:17:21,558 | DEBUG | qtp841096586-310 | AbstractProxyTransaction         | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Transaction proxy RemoteProxyTransaction{identifier=member-1-datastore-operational-fe-0-txn-2-1, state=open}
&lt;p&gt; sending request ReadTransactionRequest&lt;/p&gt;
{target=member-1-datastore-operational-fe-0-txn-2-1, sequence=0, replyTo=Actor[akka://opendaylight-cluster-data/user/$b#-460505189], snapshotOnly=true, path=/(urn:opendaylight:params:xml:ns:yang:controller:md:sal:clustering:entity-owners?revision=2015-08-04)entity-owners}
&lt;p&gt; callback org.opendaylight.controller.cluster.databroker.actors.dds.RemoteProxyTransaction$$Lambda$165/912139580@1482dd4a&lt;br/&gt;
2017-05-24 14:17:21,558 | DEBUG | qtp841096586-310 | AbstractClientConnection         | 197 - org.opendaylight.controller.cds-access-client - 1.1.0.SNAPSHOT | member-1-frontend-datastore-operational: timer already scheduled&lt;br/&gt;
2017-05-24 14:17:21,558 | DEBUG | qtp841096586-310 | TransmitQueue                    | 197 - org.opendaylight.controller.cds-access-client - 1.1.0.SNAPSHOT | Queue ConnectionEntry{request=ReadTransactionRequest&lt;/p&gt;
{target=member-1-datastore-operational-fe-0-txn-2-1, sequence=0, replyTo=Actor[akka://opendaylight-cluster-data/user/$b#-460505189], snapshotOnly=true, path=/(urn:opendaylight:params:xml:ns:yang:controller:md:sal:clustering:entity-owners?revision=2015-08-04)entity-owners}
&lt;p&gt;, enqueuedTicks=697074151026} transmitting entry {}&lt;br/&gt;
2017-05-24 14:17:28,858 | INFO  | lt-dispatcher-32 | ProtocolStateActor               | 174 - com.typesafe.akka.slf4j - 2.4.17 | No response from remote. Transport failure detector triggered. (internal state was Open)&lt;br/&gt;
2017-05-24 14:17:28,861 | WARN  | lt-dispatcher-32 | ReliableDeliverySupervisor       | 174 - com.typesafe.akka.slf4j - 2.4.17 | Association with remote system &lt;span class=&quot;error&quot;&gt;&amp;#91;akka.tcp://opendaylight-cluster-data@10.29.14.172:2550&amp;#93;&lt;/span&gt; has failed, address is now gated for &lt;span class=&quot;error&quot;&gt;&amp;#91;5000&amp;#93;&lt;/span&gt; ms. Reason: &lt;span class=&quot;error&quot;&gt;&amp;#91;Disassociated&amp;#93;&lt;/span&gt; &lt;br/&gt;
2017-05-24 14:17:32,778 | INFO  | ult-dispatcher-6 | ProtocolStateActor               | 174 - com.typesafe.akka.slf4j - 2.4.17 | No response from remote. Transport failure detector triggered. (internal state was Open)&lt;br/&gt;
2017-05-24 14:17:39,288 | DEBUG | ult-dispatcher-3 | AbstractClientConnection         | 197 - org.opendaylight.controller.cds-access-client - 1.1.0.SNAPSHOT | member-1-frontend-datastore-operational: scheduling timeout in 30 seconds&lt;br/&gt;
2017-05-24 14:17:48,899 | INFO  | ult-dispatcher-5 | ProtocolStateActor               | 174 - com.typesafe.akka.slf4j - 2.4.17 | No response from remote for outbound association. Associate timed out after &lt;span class=&quot;error&quot;&gt;&amp;#91;15000 ms&amp;#93;&lt;/span&gt;.&lt;br/&gt;
2017-05-24 14:17:48,900 | WARN  | ult-dispatcher-5 | ReliableDeliverySupervisor       | 174 - com.typesafe.akka.slf4j - 2.4.17 | Association with remote system &lt;span class=&quot;error&quot;&gt;&amp;#91;akka.tcp://opendaylight-cluster-data@10.29.14.172:2550&amp;#93;&lt;/span&gt; has failed, address is now gated for &lt;span class=&quot;error&quot;&gt;&amp;#91;5000&amp;#93;&lt;/span&gt; ms. Reason: [Association failed with &lt;span class=&quot;error&quot;&gt;&amp;#91;akka.tcp://opendaylight-cluster-data@10.29.14.172:2550&amp;#93;&lt;/span&gt;] Caused by: [No response from remote for outbound association. Associate timed out after &lt;span class=&quot;error&quot;&gt;&amp;#91;15000 ms&amp;#93;&lt;/span&gt;.]&lt;/p&gt;

&lt;p&gt;Things to note:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;No reconnect attempted.&lt;/li&gt;
	&lt;li&gt;&quot;timer already scheduled&quot; at 14:17:21,558 followed by &quot;scheduling timeout in 30 seconds&quot; at 14:17:39,288&lt;/li&gt;
	&lt;li&gt;&quot;No response from remote. Transport failure detector triggered. (internal state was Open)&quot; twice.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;This time there is no transaction stream making anything busy, this seems like a straight bug due to wrong timers or lack of reachability processing.&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;0&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://git.opendaylight.org/gerrit/57751&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/57751&lt;/a&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;1&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-cs-partnheal-longevity-only-carbon/12/archives/log.html.gz#s1-s2-t1-k3-k1-k1-k1-k1-k1-k1-k2-k1-k1-k4-k6-k1-k1-k3-k1-k3-k1-k3-k1&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-cs-partnheal-longevity-only-carbon/12/archives/log.html.gz#s1-s2-t1-k3-k1-k1-k1-k1-k1-k1-k2-k1-k1-k4-k6-k1-k1-k3-k1-k3-k1-k3-k1&lt;/a&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;2&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-cs-partnheal-longevity-only-carbon/12/archives/odl1_karaf.log.gz&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-cs-partnheal-longevity-only-carbon/12/archives/odl1_karaf.log.gz&lt;/a&gt;&lt;/p&gt;</description>
                <environment>&lt;p&gt;Operating System: All&lt;br/&gt;
Platform: All&lt;/p&gt;</environment>
        <key id="26251">CONTROLLER-1697</key>
            <summary>entity-owners read too slow: Transport failure detector triggered.</summary>
                <type id="10104" iconUrl="https://jira.opendaylight.org/secure/viewavatar?size=xsmall&amp;avatarId=10303&amp;avatarType=issuetype">Bug</type>
                                                <status id="5" iconUrl="https://jira.opendaylight.org/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="green"/>
                                    <resolution id="10003">Cannot Reproduce</resolution>
                                        <assignee username="-1">Unassigned</assignee>
                                    <reporter username="vrpolak">Vratko Polak</reporter>
                        <labels>
                    </labels>
                <created>Thu, 25 May 2017 10:41:48 +0000</created>
                <updated>Tue, 25 Jul 2023 08:24:40 +0000</updated>
                            <resolved>Mon, 29 May 2017 12:28:06 +0000</resolved>
                                                                    <component>clustering</component>
                        <due></due>
                            <votes>0</votes>
                                    <watches>1</watches>
                                                                                                                        <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                            <customfield id="customfield_11400" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10208" key="com.atlassian.jira.plugin.system.customfieldtypes:textfield">
                        <customfieldname>External issue ID</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>8547</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10201" key="com.atlassian.jira.plugin.system.customfieldtypes:url">
                        <customfieldname>External issue URL</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[https://bugs.opendaylight.org/show_bug.cgi?id=8547]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10206" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Issue Type</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10300"><![CDATA[Bug]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10000" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>0|i02s67:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>