<!-- 
RSS generated by JIRA (8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d) at Wed Feb 07 19:56:14 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>OpenDaylight JIRA</title>
    <link>https://jira.opendaylight.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>8.20.10</version>
        <build-number>820010</build-number>
        <build-date>22-06-2022</build-date>
    </build-info>


<item>
            <title>[CONTROLLER-1707] Client not reconnecting successfully after leader movement</title>
                <link>https://jira.opendaylight.org/browse/CONTROLLER-1707</link>
                <project id="10113" key="CONTROLLER">controller</project>
                    <description>&lt;p&gt;This is somehow similar to &lt;a href=&quot;https://jira.opendaylight.org/browse/CONTROLLER-1705&quot; title=&quot;Client on a follower prefix-based shard sometimes does not reconnect when the previous leader is isolated&quot; class=&quot;issue-link&quot; data-issue-key=&quot;CONTROLLER-1705&quot;&gt;&lt;del&gt;CONTROLLER-1705&lt;/del&gt;&lt;/a&gt; but this time the initial leader movement was caused by make-leader-local. The test was running on Sandbox with codebase &lt;span class=&quot;error&quot;&gt;&amp;#91;0&amp;#93;&lt;/span&gt;. From 6 similar test cases, this happened in only one of them.&lt;/p&gt;

&lt;p&gt;The test case was using module-based shard, tell-based protocol, and it was moving leadership away from member-3 where transaction-writer was running.&lt;br/&gt;
Its huge karaf.log &lt;span class=&quot;error&quot;&gt;&amp;#91;1&amp;#93;&lt;/span&gt; contains many debug lines, here is the relevant non-debug segment:&lt;/p&gt;

&lt;p&gt;2017-06-06 17:36:11,744 | INFO  | ult-dispatcher-5 | ClientActorBehavior              | 197 - org.opendaylight.controller.cds-access-client - 1.1.1.SNAPSHOT | member-3-frontend-datastore-config: connection ConnectedClientConnection{client=ClientIdentifier&lt;/p&gt;
{frontend=member-3-frontend-datastore-config, generation=0}, cookie=0, backend=ShardBackendInfo{actor=Actor&lt;a href=&quot;#1462585811&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;akka://opendaylight-cluster-data/user/shardmanager-config/member-3-shard-default-config#1462585811&lt;/a&gt;, sessionId=0, version=BORON, maxMessages=1000, cookie=0, shard=default, dataTree present=true}} reconnecting as ReconnectingClientConnection{client=ClientIdentifier{frontend=member-3-frontend-datastore-config, generation=0}
&lt;p&gt;, cookie=0, backend=ShardBackendInfo{actor=Actor&lt;a href=&quot;#1462585811&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;akka://opendaylight-cluster-data/user/shardmanager-config/member-3-shard-default-config#1462585811&lt;/a&gt;, sessionId=0, version=BORON, maxMessages=1000, cookie=0, shard=default, dataTree present=true}}&lt;br/&gt;
2017-06-06 17:36:11,744 | INFO  | ult-dispatcher-5 | ClientActorBehavior              | 197 - org.opendaylight.controller.cds-access-client - 1.1.1.SNAPSHOT | member-3-frontend-datastore-config: refreshing backend for shard 0&lt;br/&gt;
2017-06-06 17:36:11,756 | INFO  | lt-dispatcher-31 | Shard                            | 192 - org.opendaylight.controller.sal-clustering-commons - 1.5.1.SNAPSHOT | member-3-shard-default-config (Leader): Term 3 in &quot;RequestVote &lt;span class=&quot;error&quot;&gt;&amp;#91;term=3, candidateId=member-1-shard-default-config, lastLogIndex=21003, lastLogTerm=2&amp;#93;&lt;/span&gt;&quot; message is greater than leader&apos;s term 2 - switching to Follower&lt;br/&gt;
2017-06-06 17:36:11,757 | INFO  | lt-dispatcher-31 | Shard                            | 192 - org.opendaylight.controller.sal-clustering-commons - 1.5.1.SNAPSHOT | member-3-shard-default-config (Leader) :- Switching from behavior Leader to Follower, election term: 3&lt;br/&gt;
2017-06-06 17:36:11,757 | INFO  | lt-dispatcher-30 | ShardManager                     | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.1.SNAPSHOT | shard-manager-config: Received LeaderStateChanged message: LeaderStateChanged &lt;span class=&quot;error&quot;&gt;&amp;#91;memberId=member-3-shard-default-config, leaderId=null, leaderPayloadVersion=-1&amp;#93;&lt;/span&gt;&lt;br/&gt;
2017-06-06 17:36:11,757 | INFO  | lt-dispatcher-22 | RoleChangeNotifier               | 192 - org.opendaylight.controller.sal-clustering-commons - 1.5.1.SNAPSHOT | RoleChangeNotifier for member-3-shard-default-config , received role change from Leader to Follower&lt;br/&gt;
2017-06-06 17:36:11,757 | INFO  | lt-dispatcher-22 | ShardManager                     | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.1.SNAPSHOT | shard-manager-config: Received role changed for member-3-shard-default-config from Leader to Follower&lt;br/&gt;
2017-06-06 17:36:11,769 | INFO  | ult-dispatcher-5 | Shard                            | 192 - org.opendaylight.controller.sal-clustering-commons - 1.5.1.SNAPSHOT | member-3-shard-default-config (Follower): The prevLogIndex 21001 was found in the log but the term -1 is not equal to the append entriesprevLogTerm 2 - lastIndex: 21003, snapshotIndex: 21002&lt;br/&gt;
2017-06-06 17:36:11,769 | INFO  | ult-dispatcher-5 | Shard                            | 192 - org.opendaylight.controller.sal-clustering-commons - 1.5.1.SNAPSHOT | member-3-shard-default-config (Follower): Follower is out-of-sync so sending negative reply: AppendEntriesReply &lt;span class=&quot;error&quot;&gt;&amp;#91;term=3, success=false, followerId=member-3-shard-default-config, logLastIndex=21003, logLastTerm=2, forceInstallSnapshot=false, payloadVersion=5, raftVersion=3&amp;#93;&lt;/span&gt;&lt;br/&gt;
2017-06-06 17:36:11,770 | INFO  | ult-dispatcher-5 | aftActorLeadershipTransferCohort | 193 - org.opendaylight.controller.sal-akka-raft - 1.5.1.SNAPSHOT | member-3-shard-default-config: Successfully transferred leadership to member-1-shard-default-config in 445.5 ms&lt;br/&gt;
2017-06-06 17:36:11,770 | INFO  | ult-dispatcher-5 | ShardManager                     | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.1.SNAPSHOT | shard-manager-config: Received LeaderStateChanged message: LeaderStateChanged &lt;span class=&quot;error&quot;&gt;&amp;#91;memberId=member-3-shard-default-config, leaderId=member-1-shard-default-config, leaderPayloadVersion=5&amp;#93;&lt;/span&gt;&lt;br/&gt;
2017-06-06 17:36:11,771 | INFO  | ult-dispatcher-5 | ShardManager                     | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.1.SNAPSHOT | shard-manager-config: All Shards are ready - data store config is ready, available count is 0&lt;br/&gt;
2017-06-06 17:36:11,772 | INFO  | ult-dispatcher-6 | ShardManager                     | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.1.SNAPSHOT | shard-manager-config Received follower initial sync status for member-3-shard-default-config status sync done false&lt;br/&gt;
2017-06-06 17:36:11,787 | INFO  | ult-dispatcher-5 | Shard                            | 192 - org.opendaylight.controller.sal-clustering-commons - 1.5.1.SNAPSHOT | member-3-shard-default-config (Follower): The prevLogIndex 21001 was found in the log but the term -1 is not equal to the append entriesprevLogTerm 2 - lastIndex: 21003, snapshotIndex: 21002&lt;br/&gt;
2017-06-06 17:36:11,787 | INFO  | ult-dispatcher-5 | Shard                            | 192 - org.opendaylight.controller.sal-clustering-commons - 1.5.1.SNAPSHOT | member-3-shard-default-config (Follower): Follower is out-of-sync so sending negative reply: AppendEntriesReply &lt;span class=&quot;error&quot;&gt;&amp;#91;term=3, success=false, followerId=member-3-shard-default-config, logLastIndex=21003, logLastTerm=2, forceInstallSnapshot=false, payloadVersion=5, raftVersion=3&amp;#93;&lt;/span&gt;&lt;br/&gt;
2017-06-06 17:36:11,787 | INFO  | ult-dispatcher-6 | ShardManager                     | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.1.SNAPSHOT | shard-manager-config Received follower initial sync status for member-3-shard-default-config status sync done true&lt;/p&gt;

&lt;p&gt;It is not clear why no reconnect succeeded after this:&lt;/p&gt;

&lt;p&gt;2017-06-06 17:36:11,820 | WARN  | monPool-worker-2 | AbstractShardBackendResolver     | 199 - org.opendaylight.control&lt;br/&gt;
java.util.concurrent.TimeoutException: Connection attempt failed&lt;br/&gt;
&amp;lt;--&amp;gt;at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractShardBackendResolver.wrap(AbstractShardBacken&lt;br/&gt;
&amp;lt;--&amp;gt;at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractShardBackendResolver.lambda$connectShard$2(Ab&lt;br/&gt;
&amp;lt;--&amp;gt;at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)&lt;br/&gt;
&amp;lt;--&amp;gt;at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)&lt;br/&gt;
&amp;lt;--&amp;gt;at java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:443)&lt;br/&gt;
&amp;lt;--&amp;gt;at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)&lt;br/&gt;
&amp;lt;--&amp;gt;at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)&lt;br/&gt;
&amp;lt;--&amp;gt;at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)&lt;br/&gt;
&amp;lt;--&amp;gt;at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)&lt;br/&gt;
Caused by: org.opendaylight.controller.cluster.access.commands.NotLeaderException: Actor Actor[akka://opendaylight-clu&lt;br/&gt;
&amp;lt;--&amp;gt;at org.opendaylight.controller.cluster.datastore.Shard.handleConnectClient(Shard.java:436)&lt;br/&gt;
&amp;lt;--&amp;gt;at org.opendaylight.controller.cluster.datastore.Shard.handleNonRaftCommand(Shard.java:306)&lt;br/&gt;
&amp;lt;--&amp;gt;at org.opendaylight.controller.cluster.raft.RaftActor.handleCommand(RaftActor.java:270)&lt;br/&gt;
&amp;lt;--&amp;gt;at org.opendaylight.controller.cluster.common.actor.AbstractUntypedPersistentActor.onReceiveCommand(AbstractUntype&lt;br/&gt;
&amp;lt;--&amp;gt;at akka.persistence.UntypedPersistentActor.onReceive(PersistentActor.scala:170)&lt;br/&gt;
&amp;lt;--&amp;gt;at org.opendaylight.controller.cluster.common.actor.MeteringBehavior.apply(MeteringBehavior.java:104)&lt;br/&gt;
&amp;lt;--&amp;gt;at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:544)&lt;br/&gt;
&amp;lt;--&amp;gt;at akka.actor.Actor$class.aroundReceive(Actor.scala:502)&lt;br/&gt;
&amp;lt;--&amp;gt;at akka.persistence.UntypedPersistentActor.akka$persistence$Eventsourced$$super$aroundReceive(PersistentActor.scal&lt;br/&gt;
&amp;lt;--&amp;gt;at akka.persistence.Eventsourced$$anon$1.stateReceive(Eventsourced.scala:727)&lt;br/&gt;
&amp;lt;--&amp;gt;at akka.persistence.Eventsourced$class.aroundReceive(Eventsourced.scala:183)&lt;br/&gt;
&amp;lt;--&amp;gt;at akka.persistence.UntypedPersistentActor.aroundReceive(PersistentActor.scala:168)&lt;br/&gt;
&amp;lt;--&amp;gt;at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)&lt;br/&gt;
&amp;lt;--&amp;gt;at akka.actor.ActorCell.invoke(ActorCell.scala:495)&lt;br/&gt;
&amp;lt;--&amp;gt;at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)&lt;br/&gt;
&amp;lt;--&amp;gt;at akka.dispatch.Mailbox.run(Mailbox.scala:224)&lt;br/&gt;
&amp;lt;--&amp;gt;at akka.dispatch.Mailbox.exec(Mailbox.scala:234)&lt;br/&gt;
&amp;lt;--&amp;gt;at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)&lt;br/&gt;
&amp;lt;--&amp;gt;at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)&lt;br/&gt;
&amp;lt;--&amp;gt;at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)&lt;br/&gt;
&amp;lt;--&amp;gt;at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)&lt;/p&gt;

&lt;p&gt;Finally, the writer failed with RequestTimeoutException &lt;span class=&quot;error&quot;&gt;&amp;#91;2&amp;#93;&lt;/span&gt;.&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;0&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://git.opendaylight.org/gerrit/#/c/58355/2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/#/c/58355/2&lt;/a&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;1&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-only-carbon/6/odl3_karaf.log.gz&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-only-carbon/6/odl3_karaf.log.gz&lt;/a&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;2&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-only-carbon/6/log.html.gz#s1-s6-t1-k2-k11-k1&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-only-carbon/6/log.html.gz#s1-s6-t1-k2-k11-k1&lt;/a&gt;&lt;/p&gt;</description>
                <environment>&lt;p&gt;Operating System: All&lt;br/&gt;
Platform: All&lt;/p&gt;</environment>
        <key id="26261">CONTROLLER-1707</key>
            <summary>Client not reconnecting successfully after leader movement</summary>
                <type id="10104" iconUrl="https://jira.opendaylight.org/secure/viewavatar?size=xsmall&amp;avatarId=10303&amp;avatarType=issuetype">Bug</type>
                                                <status id="5" iconUrl="https://jira.opendaylight.org/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="green"/>
                                    <resolution id="10000">Done</resolution>
                                        <assignee username="rovarga">Robert Varga</assignee>
                                    <reporter username="vrpolak">Vratko Polak</reporter>
                        <labels>
                    </labels>
                <created>Wed, 7 Jun 2017 11:11:18 +0000</created>
                <updated>Tue, 25 Jul 2023 08:24:42 +0000</updated>
                            <resolved>Thu, 24 Aug 2017 13:04:32 +0000</resolved>
                                                                    <component>clustering</component>
                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                                                                <comments>
                            <comment id="52347" author="tcere" created="Wed, 7 Jun 2017 14:15:54 +0000"  >&lt;p&gt;member-1 switched to leader at 17:36:11,800, this seems like the PrimaryShardInfoCache is not flushed on NotLeaderException&lt;/p&gt;</comment>
                            <comment id="52348" author="vrpolak" created="Mon, 12 Jun 2017 08:08:25 +0000"  >&lt;p&gt;This is affecting &lt;span class=&quot;error&quot;&gt;&amp;#91;0&amp;#93;&lt;/span&gt; one longevity job. See karaf.log &lt;span class=&quot;error&quot;&gt;&amp;#91;1&amp;#93;&lt;/span&gt; between 04:43:03,671 and 04:45:08,911.&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;0&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-ddb-expl-lead-movement-longevity-only-carbon/10/log.html.gz#s1-s2-t1-k2-k1-k1-k1-k1-k1-k1-k2-k1-k1-k2-k10&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-ddb-expl-lead-movement-longevity-only-carbon/10/log.html.gz#s1-s2-t1-k2-k1-k1-k1-k1-k1-k1-k2-k1-k1-k2-k10&lt;/a&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;1&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-ddb-expl-lead-movement-longevity-only-carbon/10/odl2_karaf.log.gz&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-ddb-expl-lead-movement-longevity-only-carbon/10/odl2_karaf.log.gz&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="52349" author="vrpolak" created="Mon, 12 Jun 2017 08:19:16 +0000"  >&lt;p&gt;Also, this affects &quot;remote to remote&quot; test case &lt;span class=&quot;error&quot;&gt;&amp;#91;5&amp;#93;&lt;/span&gt; for module shard make-leader-local test, which was consistently passing previously.&lt;/p&gt;

&lt;p&gt;Looking at karaf.log &lt;span class=&quot;error&quot;&gt;&amp;#91;6&amp;#93;&lt;/span&gt;, frequent syncing on followers might be one the reasons why this Bug happens quite frequently:&lt;/p&gt;

&lt;p&gt;2017-06-10 10:43:50,814 | INFO  | rd-dispatcher-31 | ShardManager                     | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.1.Carbon | shard-manager-config Received follower initial sync status for member-3-shard-default-config status sync done false&lt;br/&gt;
2017-06-10 10:43:50,826 | INFO  | rd-dispatcher-53 | ShardManager                     | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.1.Carbon | shard-manager-config Received follower initial sync status for member-3-shard-default-config status sync done true&lt;br/&gt;
2017-06-10 10:43:50,859 | INFO  | rd-dispatcher-37 | ShardManager                     | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.1.Carbon | shard-manager-config Received follower initial sync status for member-3-shard-default-config status sync done false&lt;br/&gt;
2017-06-10 10:43:50,859 | INFO  | rd-dispatcher-37 | ShardManager                     | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.1.Carbon | shard-manager-config Received follower initial sync status for member-3-shard-default-config status sync done true&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;5&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/738/log.html.gz#s1-s24-t3-k2-k10&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/738/log.html.gz#s1-s24-t3-k2-k10&lt;/a&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;6&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/738/odl3_karaf.log.gz&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/738/odl3_karaf.log.gz&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="52350" author="vrpolak" created="Wed, 14 Jun 2017 12:21:54 +0000"  >&lt;p&gt;Recent build shows another symptom:&lt;br/&gt;
2017-06-14 05:17:49,461 | INFO  | rd-dispatcher-46 | ShardManager                     | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.1.Carbon | shard-manager-config: Received LeaderStateChanged message: LeaderStateChanged &lt;span class=&quot;error&quot;&gt;&amp;#91;memberId=member-2-shard-default-config, leaderId=null, leaderPayloadVersion=5&amp;#93;&lt;/span&gt;&lt;br/&gt;
2017-06-14 05:17:49,847 | INFO  | rd-dispatcher-39 | Shard                            | 192 - org.opendaylight.controller.sal-clustering-commons - 1.5.1.Carbon | member-2-shard-default-config: not currently leader, rejecting request Envelope{sessionId=0, txSequence=2acd, message=CommitLocalTransactionRequest{target=member-2-datastore-config-fe-0-chn-2-txn-5472-0, sequence=0, replyTo=Actor&lt;a href=&quot;#-1700534907&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;akka://opendaylight-cluster-data/user/$a#-1700534907&lt;/a&gt;, coordinated=false}}. isLeader: true, isLeaderActive: false,isLeadershipTransferInProgress: true.&lt;br/&gt;
2017-06-14 05:17:49,849 | INFO  | rd-dispatcher-39 | Shard                            | 192 - org.opendaylight.controller.sal-clustering-commons - 1.5.1.Carbon | member-2-shard-default-config: not currently leader, rejecting request Envelope{sessionId=0, txSequence=2ace, message=CommitLocalTransactionRequest{target=member-2-datastore-config-fe-0-chn-2-txn-5473-0, sequence=0, replyTo=Actor&lt;a href=&quot;#-1700534907&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;akka://opendaylight-cluster-data/user/$a#-1700534907&lt;/a&gt;, coordinated=false}}. isLeader: true, isLeaderActive: false,isLeadershipTransferInProgress: true.&lt;/p&gt;

&lt;p&gt;The last message was repeated not two times, but 15 times.&lt;/p&gt;</comment>
                            <comment id="52351" author="rovarga" created="Wed, 14 Jun 2017 13:07:47 +0000"  >&lt;p&gt;The trouble seems to be in ShardBackendResolver, where we do not invalidate unsuccessful attempts and hold on the failed state.&lt;/p&gt;</comment>
                            <comment id="52352" author="rovarga" created="Wed, 14 Jun 2017 18:05:27 +0000"  >&lt;p&gt;&lt;a href=&quot;https://git.opendaylight.org/gerrit/58936&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/58936&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="52353" author="rovarga" created="Wed, 14 Jun 2017 22:45:01 +0000"  >&lt;p&gt;That last batch of messages should be fixed by above patch.&lt;/p&gt;

&lt;p&gt;The info messages:&lt;/p&gt;

&lt;p&gt;2017-06-10 10:43:50,814 | INFO  | rd-dispatcher-31 | ShardManager                     | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.1.Carbon | shard-manager-config Received follower initial sync status for member-3-shard-default-config status sync done false&lt;br/&gt;
2017-06-10 10:43:50,826 | INFO  | rd-dispatcher-53 | ShardManager                     | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.1.Carbon | shard-manager-config Received follower initial sync status for member-3-shard-default-config status sync done true&lt;/p&gt;

&lt;p&gt;Tom, I think it is time to make it tuneable and log where we are with thresholds, with pipelining and batching we can end out a batch of append operations, and hence the indexes will be jumping by size of those batches.&lt;/p&gt;</comment>
                            <comment id="52354" author="rovarga" created="Thu, 15 Jun 2017 10:37:55 +0000"  >&lt;p&gt;Configuration knob: &lt;a href=&quot;https://git.opendaylight.org/gerrit/58991&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/58991&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="52355" author="tpantelis" created="Thu, 15 Jun 2017 15:03:36 +0000"  >&lt;p&gt;I think you&apos;re right about the batching causing false out-of-sync status. I think Follower should update the SyncStatusTracker after it updates it&apos;s commitIndex.&lt;/p&gt;</comment>
                            <comment id="52356" author="rovarga" created="Thu, 15 Jun 2017 15:52:12 +0000"  >&lt;p&gt;Tom, do you mean something like: &lt;a href=&quot;https://git.opendaylight.org/gerrit/59037&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/59037&lt;/a&gt; ?&lt;/p&gt;</comment>
                            <comment id="52357" author="tpantelis" created="Thu, 15 Jun 2017 15:59:35 +0000"  >&lt;p&gt;(In reply to Robert Varga from comment #10)&lt;br/&gt;
&amp;gt; Tom, do you mean something like: &lt;a href=&quot;https://git.opendaylight.org/gerrit/59037&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/59037&lt;/a&gt; ?&lt;/p&gt;

&lt;p&gt;exactly&lt;/p&gt;</comment>
                            <comment id="52358" author="vrpolak" created="Tue, 20 Jun 2017 10:14:49 +0000"  >&lt;p&gt;&amp;gt; PrimaryShardInfoCache is not flushed on NotLeaderException&lt;/p&gt;

&lt;p&gt;I think this, or something similar still needs to be fixed.&lt;/p&gt;

&lt;p&gt;Recent RelEng run showed RequestTimeoutException &lt;span class=&quot;error&quot;&gt;&amp;#91;0&amp;#93;&lt;/span&gt; after module-based shard replica was removed from Leader located on a different member than writer.&lt;/p&gt;

&lt;p&gt;Karaf.log on the writer &lt;span class=&quot;error&quot;&gt;&amp;#91;1&amp;#93;&lt;/span&gt; shows the initial NotLeaderException at 06:47:44,331, the replaced connection to the new leader at 06:48:49,054 and transaction error at 06:49:44,761.&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;0&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/755/log.html.gz#s1-s20-t3-k2-k9&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/755/log.html.gz#s1-s20-t3-k2-k9&lt;/a&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;1&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/755/odl2_karaf.log.gz&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/755/odl2_karaf.log.gz&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="52359" author="rovarga" created="Tue, 20 Jun 2017 12:10:17 +0000"  >&lt;p&gt;2017-06-20 06:48:49,058 | WARN  | rd-dispatcher-36 | ShardDataTree                    | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.1.Carbon | member-3-shard-default-config: No transactions enqueued while attempting to start canCommit on org.opendaylight.controller.cluster.datastore.SimpleShardDataTreeCohort$Normal@389c8969&lt;/p&gt;

&lt;p&gt;would explain that. Need debugs to move forward.&lt;/p&gt;</comment>
                            <comment id="52360" author="vrpolak" created="Mon, 3 Jul 2017 14:43:50 +0000"  >&lt;p&gt;It seems this can still happen, but with such a reduced frequency, that longevity test is needed to reproduce &lt;span class=&quot;error&quot;&gt;&amp;#91;9&amp;#93;&lt;/span&gt; reliably.&lt;/p&gt;

&lt;p&gt;In this run and iteration (previous 76 passed), Leader is being moved away from a member where a single writer is running (the same as the initial description of this Bug).&lt;/p&gt;

&lt;p&gt;&amp;gt; would explain that.&lt;/p&gt;

&lt;p&gt;Yes, &quot;No transactions enqueued&quot; happened (in the log &lt;span class=&quot;error&quot;&gt;&amp;#91;10&amp;#93;&lt;/span&gt;), that is (one more reason) why I believe this Bug is distinct &lt;a href=&quot;https://jira.opendaylight.org/browse/CONTROLLER-1737&quot; title=&quot;Current transaction in state READY leads to RequestTimeoutException (120s) after make-leader-local&quot; class=&quot;issue-link&quot; data-issue-key=&quot;CONTROLLER-1737&quot;&gt;&lt;del&gt;CONTROLLER-1737&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;2017-07-02 00:17:03,690 | WARN  | rd-dispatcher-36 | ShardDataTree                    | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.1.SNAPSHOT | member-1-shard-default-config: No transactions enqueued while attempting to start canCommit on org.opendaylight.controller.cluster.datastore.SimpleShardDataTreeCohort$Normal@4887c30b&lt;/p&gt;

&lt;p&gt;Also a Sandbox run sees the same failure &lt;span class=&quot;error&quot;&gt;&amp;#91;11&amp;#93;&lt;/span&gt; and message (log &lt;span class=&quot;error&quot;&gt;&amp;#91;12&amp;#93;&lt;/span&gt; at 11:33:40,546). Another run with more verbose logs will come shortly.&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;9&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-ddb-expl-lead-movement-longevity-only-carbon/13/log.html.gz#s1-s2-t1-k2-k1-k1-k1-k1-k1-k1-k2-k1-k1-k2-k10&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-ddb-expl-lead-movement-longevity-only-carbon/13/log.html.gz#s1-s2-t1-k2-k1-k1-k1-k1-k1-k1-k2-k1-k1-k2-k10&lt;/a&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;10&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-ddb-expl-lead-movement-longevity-only-carbon/13/odl1_karaf.log.gz&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-ddb-expl-lead-movement-longevity-only-carbon/13/odl1_karaf.log.gz&lt;/a&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;11&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-ddb-expl-lead-movement-longevity-only-carbon/1/log.html.gz#s1-s2-t1-k2-k1-k1-k1-k1-k1-k1-k2-k1-k1-k2-k10&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-ddb-expl-lead-movement-longevity-only-carbon/1/log.html.gz#s1-s2-t1-k2-k1-k1-k1-k1-k1-k1-k2-k1-k1-k2-k10&lt;/a&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;12&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-ddb-expl-lead-movement-longevity-only-carbon/1/odl2_karaf.log.gz&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-ddb-expl-lead-movement-longevity-only-carbon/1/odl2_karaf.log.gz&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="52361" author="rovarga" created="Mon, 3 Jul 2017 15:53:29 +0000"  >&lt;p&gt;I think the trouble comes from this bit:&lt;/p&gt;

&lt;p&gt;2017-07-03 11:33:32,294 | WARN  | rd-dispatcher-35 | ShardDataTree                    | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.1.SNAPSHOT | member-2-shard-default-config: Current transaction member-1-datastore-config-fe-0-chn-8-txn-7141-0 has timed out after 15000 ms in state READY&lt;br/&gt;
2017-07-03 11:33:40,546 | WARN  | rd-dispatcher-49 | ShardDataTree                    | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.1.SNAPSHOT | member-2-shard-default-config: No transactions enqueued while attempting to start canCommit on org.opendaylight.controller.cluster.datastore.SimpleShardDataTreeCohort$Normal@3b1d9491&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://git.opendaylight.org/gerrit/59890&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/59890&lt;/a&gt; adds more logging, but I do believe the second line points to the same transaction as the one aborted in ready state.&lt;/p&gt;

&lt;p&gt;When that happens, we are only removing the head of the queue, but not propagate that to the cohort (ShardDataTree line 1044).&lt;/p&gt;</comment>
                            <comment id="52362" author="rovarga" created="Mon, 3 Jul 2017 16:59:58 +0000"  >&lt;p&gt;This scenario can happen if we are performing a reconnect operation concurrently with the transaction being submitted &amp;#8211; we have to flush the state to backend (hence the transaction is READY), but the actual commit request can be delayed.&lt;/p&gt;

&lt;p&gt;From long-term perspective, I don&apos;t think ShardDataTree should be pruning ready transactions, as they should not be holding up forward progress. Unfortunately that requires interaction with leader movement logic and addition of per-chain queues (to detect the bad case when a subsequent transaction is actually scheduled for commit).&lt;/p&gt;

&lt;p&gt;In short-term, &lt;a href=&quot;https://git.opendaylight.org/gerrit/59895&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/59895&lt;/a&gt; propagates the timeout in READY state towards the cohort &amp;#8211; hence rather than throwing a warning on the backend as taking the usual 120 seconds to time out (because the backend will not produce a response), we will report the timeout as a result of the CAN_COMMIT stage.&lt;/p&gt;</comment>
                            <comment id="52363" author="vrpolak" created="Mon, 3 Jul 2017 17:11:12 +0000"  >&lt;p&gt;Sandbox run with a logging change &lt;span class=&quot;error&quot;&gt;&amp;#91;13&amp;#93;&lt;/span&gt; failed here &lt;span class=&quot;error&quot;&gt;&amp;#91;14&amp;#93;&lt;/span&gt;.&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;13&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://git.opendaylight.org/gerrit/#/c/59890/2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/#/c/59890/2&lt;/a&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;14&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-ddb-expl-lead-movement-longevity-only-carbon/4/log.html.gz#s1-s2-t1-k2-k1-k1-k1-k1-k1-k1-k2-k1-k1-k2-k10&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-ddb-expl-lead-movement-longevity-only-carbon/4/log.html.gz#s1-s2-t1-k2-k1-k1-k1-k1-k1-k1-k2-k1-k1-k2-k10&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="52364" author="vrpolak" created="Tue, 4 Jul 2017 10:07:32 +0000"  >&lt;p&gt;The comment &lt;span class=&quot;error&quot;&gt;&amp;#91;15&amp;#93;&lt;/span&gt; was intended for this Bug.&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;15&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://bugs.opendaylight.org/show_bug.cgi?id=8494#c26&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://bugs.opendaylight.org/show_bug.cgi?id=8494#c26&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="52365" author="vrpolak" created="Tue, 4 Jul 2017 10:10:32 +0000"  >&lt;p&gt;Copy of &lt;span class=&quot;error&quot;&gt;&amp;#91;15&amp;#93;&lt;/span&gt; with changed link numbers:&lt;/p&gt;

&lt;p&gt;With &lt;span class=&quot;error&quot;&gt;&amp;#91;16&amp;#93;&lt;/span&gt; the message &lt;span class=&quot;error&quot;&gt;&amp;#91;17&amp;#93;&lt;/span&gt; is different:&lt;br/&gt;
&quot;Backend timeout in state READY after 15000ms&quot; is returned 35 second after the leader movement.&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;16&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://git.opendaylight.org/gerrit/#/c/59895/1&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/#/c/59895/1&lt;/a&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;17&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-ddb-expl-lead-movement-longevity-only-carbon/5/log.html.gz#s1-s2-t1-k2-k1-k1-k1-k1-k1-k1-k2-k1-k1-k2-k10&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-ddb-expl-lead-movement-longevity-only-carbon/5/log.html.gz#s1-s2-t1-k2-k1-k1-k1-k1-k1-k1-k2-k1-k1-k2-k10&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="52366" author="vrpolak" created="Tue, 4 Jul 2017 10:12:48 +0000"  >&lt;p&gt;Code &lt;span class=&quot;error&quot;&gt;&amp;#91;18&amp;#93;&lt;/span&gt; shows the same Robot behavior &lt;span class=&quot;error&quot;&gt;&amp;#91;19&amp;#93;&lt;/span&gt; as &lt;span class=&quot;error&quot;&gt;&amp;#91;16&amp;#93;&lt;/span&gt;.&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;18&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://git.opendaylight.org/gerrit/#/c/59896/1&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/#/c/59896/1&lt;/a&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;19&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-ddb-expl-lead-movement-longevity-only-carbon/6/log.html.gz#s1-s2-t1-k2-k1-k1-k1-k1-k1-k1-k2-k1-k1-k2-k10&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-ddb-expl-lead-movement-longevity-only-carbon/6/log.html.gz#s1-s2-t1-k2-k1-k1-k1-k1-k1-k1-k2-k1-k1-k2-k10&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="52367" author="vrpolak" created="Tue, 4 Jul 2017 11:04:53 +0000"  >&lt;p&gt;&amp;gt; shows the same Robot behavior&lt;/p&gt;

&lt;p&gt;As does &lt;span class=&quot;error&quot;&gt;&amp;#91;20&amp;#93;&lt;/span&gt; code &lt;span class=&quot;error&quot;&gt;&amp;#91;21&amp;#93;&lt;/span&gt;.&lt;br/&gt;
If anything, the failures happen after less iterations, compared to current stable/carbon.&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;20&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-ddb-expl-lead-movement-longevity-only-carbon/7/log.html.gz#s1-s2-t1-k2-k1-k1-k1-k1-k1-k1-k2-k1-k1-k2-k10&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-ddb-expl-lead-movement-longevity-only-carbon/7/log.html.gz#s1-s2-t1-k2-k1-k1-k1-k1-k1-k1-k2-k1-k1-k2-k10&lt;/a&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;21&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://git.opendaylight.org/gerrit/#/c/59898/1&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/#/c/59898/1&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="52368" author="vrpolak" created="Tue, 4 Jul 2017 12:35:13 +0000"  >&lt;p&gt;The next patch set &lt;span class=&quot;error&quot;&gt;&amp;#91;22&amp;#93;&lt;/span&gt; still looks &lt;span class=&quot;error&quot;&gt;&amp;#91;23&amp;#93;&lt;/span&gt; the same.&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;22&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://git.opendaylight.org/gerrit/#/c/59898/2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/#/c/59898/2&lt;/a&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;23&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-ddb-expl-lead-movement-longevity-only-carbon/8/log.html.gz#s1-s2-t1-k2-k1-k1-k1-k1-k1-k1-k2-k1-k1-k2-k10&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-ddb-expl-lead-movement-longevity-only-carbon/8/log.html.gz#s1-s2-t1-k2-k1-k1-k1-k1-k1-k1-k2-k1-k1-k2-k10&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="52369" author="vrpolak" created="Thu, 6 Jul 2017 09:53:27 +0000"  >&lt;p&gt;Another patch &lt;span class=&quot;error&quot;&gt;&amp;#91;24&amp;#93;&lt;/span&gt;, the same result &lt;span class=&quot;error&quot;&gt;&amp;#91;25&amp;#93;&lt;/span&gt;.&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;24&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://git.opendaylight.org/gerrit/#/c/59998/1&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/#/c/59998/1&lt;/a&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;25&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-ddb-expl-lead-movement-longevity-only-carbon/9/log.html.gz#s1-s2-t1-k2-k1-k1-k1-k1-k1-k1-k2-k1-k1-k2-k10&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-ddb-expl-lead-movement-longevity-only-carbon/9/log.html.gz#s1-s2-t1-k2-k1-k1-k1-k1-k1-k1-k2-k1-k1-k2-k10&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="52370" author="rovarga" created="Thu, 6 Jul 2017 15:52:59 +0000"  >&lt;p&gt;Since tell-based protocol can replay state from frontend, but requires the command stream to be processed in order, we have a different set of constraints on what needs to happens on the backend during leadership transfer.&lt;/p&gt;

&lt;p&gt;The first part of this is that Shard needs to know when a pauseLeader() operation no longer runs and has been aborted. Patch &lt;a href=&quot;https://git.opendaylight.org/gerrit/60032&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/60032&lt;/a&gt; introduces unpauseLeader() hook.&lt;/p&gt;

&lt;p&gt;The second part will be blowing up LeaderFrontendState and purging it from ShardDataTree when pauseLeader() is invoked &amp;#8211; which is fine, as we do not allow tell-based protocol to make any progress while we are not an active leader. If the transition process succeeds, we end up being a follower of shut down, which is fine. If the transition fails, we can restore LeaderFrontendState from the journal, the same way we do it when we transition from being a follower.&lt;/p&gt;</comment>
                            <comment id="52371" author="vrpolak" created="Mon, 10 Jul 2017 11:43:48 +0000"  >&lt;p&gt;&amp;gt; &quot;Backend timeout in state READY after 15000ms&quot; is returned 35 second after the leader movement.&lt;/p&gt;

&lt;p&gt;Later code &lt;span class=&quot;error&quot;&gt;&amp;#91;26&amp;#93;&lt;/span&gt;, behavior &lt;span class=&quot;error&quot;&gt;&amp;#91;27&amp;#93;&lt;/span&gt; still the same.&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;26&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://git.opendaylight.org/gerrit/#/c/60033/5&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/#/c/60033/5&lt;/a&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;27&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-ddb-expl-lead-movement-longevity-only-carbon/1/log.html.gz#s1-s2-t1-k2-k1-k1-k1-k1-k1-k1-k2-k1-k1-k2-k10&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-ddb-expl-lead-movement-longevity-only-carbon/1/log.html.gz#s1-s2-t1-k2-k1-k1-k1-k1-k1-k1-k2-k1-k1-k2-k10&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="52372" author="vrpolak" created="Mon, 10 Jul 2017 12:21:38 +0000"  >&lt;p&gt;This still happens &lt;span class=&quot;error&quot;&gt;&amp;#91;28&amp;#93;&lt;/span&gt; on occasion in releng functional tests, as identified by &quot;No transactions enqueued&quot; in new leader karaf.log &lt;span class=&quot;error&quot;&gt;&amp;#91;29&amp;#93;&lt;/span&gt; (at 09:21:00,668).&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;28&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/776/log.html.gz#s1-s20-t3-k2-k8&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/776/log.html.gz#s1-s20-t3-k2-k8&lt;/a&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;29&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/776/odl3_karaf.log.gz&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/776/odl3_karaf.log.gz&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="52373" author="rovarga" created="Mon, 10 Jul 2017 13:57:24 +0000"  >&lt;p&gt;The failure in comment 24 looks like it is caused by synchronization in test driver. More specifically, the frontend thread is prevented from executing. Specifically it is suspended between:&lt;/p&gt;

&lt;p&gt;2017-07-10 10:06:47,396 | DEBUG | ult-dispatcher-5 | AbstractClientConnection         | 197 - org.opendaylight.controller.cds-access-client - 1.1.1.SNAPSHOT | Completing TransmittedConnectionEntry{request=ModifyTransactionRequest&lt;/p&gt;
{target=member-3-datastore-config-fe-0-chn-8-txn-7786-0, sequence=1, replyTo=Actor[akka://opendaylight-cluster-data/user/$a#1563295464], modifications=0, protocol=SIMPLE}
&lt;p&gt;, enqueuedTicks=1154543474499, sessionId=12, txSequence=7698} with Envelope{sessionId=c, txSequence=1e12, message=TransactionCommitSuccess{target=member-3-datastore-config-fe-0-chn-8-txn-7786-0, sequence=1}}&lt;/p&gt;

&lt;p&gt;and&lt;/p&gt;

&lt;p&gt;2017-07-10 10:07:12,575 | DEBUG | ult-dispatcher-5 | AbstractTransactionHandler       | 257 - org.opendaylight.controller.samples.clustering-it-provider - 1.5.1.SNAPSHOT | Future #7786 completed successfully&lt;/p&gt;

&lt;p&gt;during this time the test driver is busily introducing new transactions and holding the lock...&lt;/p&gt;</comment>
                            <comment id="52374" author="rovarga" created="Mon, 10 Jul 2017 14:14:03 +0000"  >&lt;p&gt;&lt;a href=&quot;https://git.opendaylight.org/gerrit/60033&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/60033&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://git.opendaylight.org/gerrit/60137&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/60137&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10000">
                    <name>Blocks</name>
                                            <outwardlinks description="blocks">
                                        <issuelink>
            <issuekey id="26291">CONTROLLER-1737</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                            <customfield id="customfield_11400" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10208" key="com.atlassian.jira.plugin.system.customfieldtypes:textfield">
                        <customfieldname>External issue ID</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>8618</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10201" key="com.atlassian.jira.plugin.system.customfieldtypes:url">
                        <customfieldname>External issue URL</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[https://bugs.opendaylight.org/show_bug.cgi?id=8618]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10206" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Issue Type</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10300"><![CDATA[Bug]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10204" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>ODL SR Target Milestone</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10337"><![CDATA[Carbon-SR2]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_10000" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>0|i02s8f:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>