<!-- 
RSS generated by JIRA (8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d) at Wed Feb 07 19:56:44 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>OpenDaylight JIRA</title>
    <link>https://jira.opendaylight.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>8.20.10</version>
        <build-number>820010</build-number>
        <build-date>22-06-2022</build-date>
    </build-info>


<item>
            <title>[CONTROLLER-1900] Performance regression in cluster registration</title>
                <link>https://jira.opendaylight.org/browse/CONTROLLER-1900</link>
                <project id="10113" key="CONTROLLER">controller</project>
                    <description>&lt;p&gt;The problem seems to be in all branches:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jenkins.opendaylight.org/releng/job/controller-csit-3node-clustering-ask-all-neon/247/robot/controller-clustering-ask.txt/Chasing%20The%20Leader/Unregister_Candidates_And_Validate_Criteria/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://jenkins.opendaylight.org/releng/job/controller-csit-3node-clustering-ask-all-neon/247/robot/controller-clustering-ask.txt/Chasing%20The%20Leader/Unregister_Candidates_And_Validate_Criteria/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jenkins.opendaylight.org/releng/job/controller-csit-3node-clustering-tell-all-fluorine/220/robot/controller-clustering-tell.txt/Chasing%20The%20Leader/Unregister_Candidates_And_Validate_Criteria/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://jenkins.opendaylight.org/releng/job/controller-csit-3node-clustering-tell-all-fluorine/220/robot/controller-clustering-tell.txt/Chasing%20The%20Leader/Unregister_Candidates_And_Validate_Criteria/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In Fluorine it started at around May 23rd so there multiple suspect patches:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://git.opendaylight.org/gerrit/#/q/branch:stable/fluorine+project:controller&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/#/q/branch:stable/fluorine+project:controller&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The test itself does:&lt;/p&gt;

&lt;p&gt;1) Start a singleton registration flap on every controller instance with this RPC: /restconf/operations/odl-mdsal-lowlevel-control:register-flapping-singleton&lt;/p&gt;

&lt;p&gt;2) Maintain the flap for 60 secs.&lt;/p&gt;

&lt;p&gt;3) Stop the flap on every controller instance: /restconf/operations/odl-mdsal-lowlevel-control:unregister-flapping-singleton&lt;/p&gt;

&lt;p&gt;4) Get flap count from above RPC response: &amp;lt;output xmlns=&quot;tag:opendaylight.org,2017:controller:yang:lowlevel:control&quot;&amp;gt;&amp;lt;flap-count&amp;gt;83&amp;lt;/flap-count&amp;gt;&amp;lt;/output&amp;gt;&lt;/p&gt;

&lt;p&gt;5) Add all the flaps for the 3 controller instances and divide the total by 60 secs.&lt;/p&gt;

&lt;p&gt;Before the regression, the controller handled ~50 flaps/sec, after the regression is less than 5 flaps/sec.&lt;/p&gt;</description>
                <environment></environment>
        <key id="31734">CONTROLLER-1900</key>
            <summary>Performance regression in cluster registration</summary>
                <type id="10104" iconUrl="https://jira.opendaylight.org/secure/viewavatar?size=xsmall&amp;avatarId=10303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="1" iconUrl="https://jira.opendaylight.org/images/icons/priorities/blocker.svg">Highest</priority>
                        <status id="5" iconUrl="https://jira.opendaylight.org/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="green"/>
                                    <resolution id="10000">Done</resolution>
                                        <assignee username="ecelgp">Luis Gomez</assignee>
                                    <reporter username="ecelgp">Luis Gomez</reporter>
                        <labels>
                    </labels>
                <created>Thu, 6 Jun 2019 04:26:13 +0000</created>
                <updated>Wed, 12 Jun 2019 09:20:20 +0000</updated>
                            <resolved>Wed, 12 Jun 2019 09:20:20 +0000</resolved>
                                    <version>Fluorine SR3</version>
                    <version>Neon SR2</version>
                                    <fixVersion>Sodium</fixVersion>
                    <fixVersion>Fluorine SR3</fixVersion>
                    <fixVersion>Neon SR2</fixVersion>
                                    <component>clustering</component>
                        <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                                                                <comments>
                            <comment id="66847" author="rovarga" created="Thu, 6 Jun 2019 14:23:00 +0000"  >&lt;p&gt;Couple of observations here... I took the baseline of &lt;a href=&quot;https://jenkins.opendaylight.org/releng/job/controller-csit-3node-clustering-ask-all-neon/239/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://jenkins.opendaylight.org/releng/job/controller-csit-3node-clustering-ask-all-neon/239/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The weird thing is that for some reason OFP is reacting to the entity changes for others, which is weird and should not really happen:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
2019-05-30T06:42:59,075 | INFO&#160; | opendaylight-cluster-data-akka.actor.&lt;span class=&quot;code-keyword&quot;&gt;default&lt;/span&gt;-dispatcher-37 | ContextChainHolderImpl&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 415 - org.opendaylight.openflowplugin.impl - 0.8.2 | Try to remove device pcep-topology-service-group from operational DS

2019-05-30T07:22:50,531 | INFO&#160; | opendaylight-cluster-data-akka.actor.&lt;span class=&quot;code-keyword&quot;&gt;default&lt;/span&gt;-dispatcher-3 | ContextChainHolderImpl&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 415 - org.opendaylight.openflowplugin.impl - 0.8.2 | Try to remove device flapping-singleton-service from operational DS&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Those things are definitely none of OFP&apos;s business, so I wonder how we are getting these. They may end up interfering a bit with CSS, but I am not quite sure &#8211; &lt;a href=&quot;https://git.opendaylight.org/gerrit/#/c/80707/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/#/c/80707/&lt;/a&gt; landed in the area.&lt;/p&gt;

&lt;p&gt;In any case, yes, there is a notable drop in the number of these events (239 vs. 247):&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;nite@nitebug : ~/ctrl1900/x$ cat odl* | fgrep &apos;Try to remove device flapping&apos; | sort | wc -l
16038
nite@nitebug : ~/ctrl1900$ cat odl* | fgrep &apos;Try to remove device flapping&apos; | sort | wc -l
660&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Now looking at the details of those logs, it seems the problem is that we are getting pauses:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
2019-05-30T07:23:19,461 | INFO&#160; | opendaylight-cluster-data-akka.actor.&lt;span class=&quot;code-keyword&quot;&gt;default&lt;/span&gt;-dispatcher-18 | ContextChainHolderImpl&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 415 - org.opendaylight.openflowplugin.impl - 0.8.2 | Try to remove device flapping-singleton-service from o
perational DS
2019-05-30T07:23:19,479 | INFO&#160; | opendaylight-cluster-data-akka.actor.&lt;span class=&quot;code-keyword&quot;&gt;default&lt;/span&gt;-dispatcher-3 | ContextChainHolderImpl&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 415 - org.opendaylight.openflowplugin.impl - 0.8.2 | Try to remove device flapping-singleton-service from op
erational DS
2019-05-30T07:23:19,493 | INFO&#160; | opendaylight-cluster-data-akka.actor.&lt;span class=&quot;code-keyword&quot;&gt;default&lt;/span&gt;-dispatcher-2 | ContextChainHolderImpl&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 415 - org.opendaylight.openflowplugin.impl - 0.8.2 | Try to remove device flapping-singleton-service from op
erational DS

&amp;gt;&amp;gt; 1 second

2019-05-30T07:23:20,502 | INFO&#160; | opendaylight-cluster-data-akka.actor.&lt;span class=&quot;code-keyword&quot;&gt;default&lt;/span&gt;-dispatcher-40 | ContextChainHolderImpl&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 415 - org.opendaylight.openflowplugin.impl - 0.8.2 | Try to remove device flapping-singleton-service from o
perational DS
2019-05-30T07:23:20,519 | INFO&#160; | opendaylight-cluster-data-akka.actor.&lt;span class=&quot;code-keyword&quot;&gt;default&lt;/span&gt;-dispatcher-40 | ContextChainHolderImpl&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 415 - org.opendaylight.openflowplugin.impl - 0.8.2 | Try to remove device flapping-singleton-service from operational DS
2019-05-30T07:23:20,527 | INFO&#160; | opendaylight-cluster-data-akka.actor.&lt;span class=&quot;code-keyword&quot;&gt;default&lt;/span&gt;-dispatcher-40 | ContextChainHolderImpl&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 415 - org.opendaylight.openflowplugin.impl - 0.8.2 | Try to remove device flapping-singleton-service from operational DS

&amp;gt;&amp;gt; 1 second

2019-05-30T07:23:21,542 | INFO&#160; | opendaylight-cluster-data-akka.actor.&lt;span class=&quot;code-keyword&quot;&gt;default&lt;/span&gt;-dispatcher-3 | ContextChainHolderImpl&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 415 - org.opendaylight.openflowplugin.impl - 0.8.2 | Try to remove device flapping-singleton-service from operational DS
2019-05-30T07:23:21,576 | INFO&#160; | opendaylight-cluster-data-akka.actor.&lt;span class=&quot;code-keyword&quot;&gt;default&lt;/span&gt;-dispatcher-7 | ContextChainHolderImpl&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 415 - org.opendaylight.openflowplugin.impl - 0.8.2 | Try to remove device flapping-singleton-service from operational DS
2019-05-30T07:23:21,586 | INFO&#160; | opendaylight-cluster-data-akka.actor.&lt;span class=&quot;code-keyword&quot;&gt;default&lt;/span&gt;-dispatcher-3 | ContextChainHolderImpl&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 415 - org.opendaylight.openflowplugin.impl - 0.8.2 | Try to remove device flapping-singleton-service from operational DS
2019-05-30T07:23:22,583 | INFO&#160; | opendaylight-cluster-data-akka.actor.&lt;span class=&quot;code-keyword&quot;&gt;default&lt;/span&gt;-dispatcher-39 | ContextChainHolderImpl&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 415 - org.opendaylight.openflowplugin.impl - 0.8.2 | Try to remove device flapping-singleton-service from operational DS

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href=&quot;https://jira.opendaylight.org/secure/ViewProfile.jspa?name=ecelgp&quot; class=&quot;user-hover&quot; rel=&quot;ecelgp&quot;&gt;ecelgp&lt;/a&gt; can you try running this test suite with the OFP patch reverted, please?&lt;/p&gt;

&lt;p&gt;In the mean time I will look into why exactly is OFP even getting these events, as that &lt;b&gt;definitely&lt;/b&gt; should not be happening. It is also possible we are leaking metadata in the Shard, I will look at that, too.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="66848" author="rovarga" created="Thu, 6 Jun 2019 14:41:08 +0000"  >&lt;p&gt;Ah, but Fluorine does not have that OFP patch.&lt;/p&gt;</comment>
                            <comment id="66849" author="rovarga" created="Thu, 6 Jun 2019 14:49:18 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.opendaylight.org/secure/ViewProfile.jspa?name=ecelgp&quot; class=&quot;user-hover&quot; rel=&quot;ecelgp&quot;&gt;ecelgp&lt;/a&gt; can we re-run that job with debug enabled on org.opendaylight.controller.cluster.datastore.entityownership.EntityOwnershipShard, please?&lt;/p&gt;</comment>
                            <comment id="66850" author="rovarga" created="Thu, 6 Jun 2019 15:04:34 +0000"  >&lt;p&gt;So OFP is explicitly registering to CSS internal events, hence that is not a bug in CSS or EOS, but in OFP &#8211; filed &lt;a href=&quot;https://jira.opendaylight.org/browse/OPNFLWPLUG-1072&quot; title=&quot;ContextChainHolderImpl encroaches on Cluster Singleton Service internals&quot; class=&quot;issue-link&quot; data-issue-key=&quot;OPNFLWPLUG-1072&quot;&gt;OPNFLWPLUG-1072&lt;/a&gt; for that.&lt;/p&gt;</comment>
                            <comment id="66860" author="ecelgp" created="Sat, 8 Jun 2019 05:58:34 +0000"  >&lt;p&gt;This run has the DEBUG activated:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jenkins.opendaylight.org/releng/view/controller/job/controller-csit-3node-clustering-ask-all-fluorine/231/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://jenkins.opendaylight.org/releng/view/controller/job/controller-csit-3node-clustering-ask-all-fluorine/231/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This fixes the gclogs:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://git.opendaylight.org/gerrit/#/c/82459/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/#/c/82459/&lt;/a&gt;&lt;/p&gt;
</comment>
                            <comment id="66861" author="rovarga" created="Mon, 10 Jun 2019 12:04:59 +0000"  >&lt;p&gt;Okay, I pulled the logs and started looking. ODL1 is the leader here, and it sees a significant pause:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2019-06-08T04:23:43,277 | DEBUG | opendaylight-cluster-data-shard-dispatcher-20 | EntityOwnershipShard&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 280 - org.opendaylight.controller.sal-clustering-commons - 1.8.3.SNAPSHOT | member-1-shard-entity-ownership-operational: Transaction member-1-entity-ownership-internal-fe-0-txn-80-0 committed as 0, sending response to Actor[akka://opendaylight-cluster-data/user/shardmanager-operational/member-1-shard-entity-ownership-operational#340224012]
2019-06-08T04:23:43,277 | DEBUG | opendaylight-cluster-data-shard-dispatcher-20 | EntityOwnershipShard&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 280 - org.opendaylight.controller.sal-clustering-commons - 1.8.3.SNAPSHOT | member-1-shard-entity-ownership-operational: fakeSnapshot purging log to 114 for term 1
2019-06-08T04:23:43,277 | DEBUG | opendaylight-cluster-data-shard-dispatcher-20 | EntityOwnershipShard&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 280 - org.opendaylight.controller.sal-clustering-commons - 1.8.3.SNAPSHOT | member-1-shard-entity-ownership-operational (Leader): Checking sendAppendEntries for follower member-2-shard-entity-ownership-operational: active: true, followerNextIndex: 116, leaderLastIndex: 115, leaderSnapShotIndex: 114
2019-06-08T04:23:43,277 | DEBUG | opendaylight-cluster-data-shard-dispatcher-20 | EntityOwnershipShard&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 280 - org.opendaylight.controller.sal-clustering-commons - 1.8.3.SNAPSHOT | member-1-shard-entity-ownership-operational (Leader): handleAppendEntriesReply - FollowerLogInformation for member-3-shard-entity-ownership-operational updated: matchIndex: 115, nextIndex: 116
2019-06-08T04:23:43,278 | DEBUG | opendaylight-cluster-data-shard-dispatcher-20 | EntityOwnershipShard&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 280 - org.opendaylight.controller.sal-clustering-commons - 1.8.3.SNAPSHOT | member-1-shard-entity-ownership-operational (Leader): Checking sendAppendEntries for follower member-3-shard-entity-ownership-operational: active: true, followerNextIndex: 116, leaderLastIndex: 115, leaderSnapShotIndex: 114
2019-06-08T04:23:43,279 | DEBUG | opendaylight-cluster-data-shard-dispatcher-20 | EntityOwnershipShard&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 280 - org.opendaylight.controller.sal-clustering-commons - 1.8.3.SNAPSHOT | member-1-shard-entity-ownership-operational: Persisting ApplyJournalEntries with index=115
2019-06-08T04:23:43,279 | DEBUG | opendaylight-cluster-data-shard-dispatcher-20 | EntityOwnershipShard&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 280 - org.opendaylight.controller.sal-clustering-commons - 1.8.3.SNAPSHOT | BatchedModifications commit member-1-entity-ownership-internal-fe-0-txn-80-0 succeeded
2019-06-08T04:23:44,083 | DEBUG | opendaylight-cluster-data-shard-dispatcher-30 | EntityOwnershipShard&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 280 - org.opendaylight.controller.sal-clustering-commons - 1.8.3.SNAPSHOT | member-1-shard-entity-ownership-operational: Applying 1 batched modifications for Tx member-3-entity-ownership-internal-fe-0-txn-17-0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;So we have a delay in processing of 804ms. During this time ODL2 does:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2019-06-08T04:23:43,282 | DEBUG | opendaylight-cluster-data-shard-dispatcher-46 | EntityOwnershipShard&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 280 - org.opendaylight.controller.sal-clustering-commons - 1.8.3.SNAPSHOT | member-2-shard-entity-ownership-operational:
&#160;Applying state for log index 114 data org.opendaylight.controller.cluster.datastore.persisted.CommitTransactionPayload@5478d243
2019-06-08T04:23:43,283 | DEBUG | opendaylight-cluster-data-shard-dispatcher-46 | EntityOwnershipShard&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 280 - org.opendaylight.controller.sal-clustering-commons - 1.8.3.SNAPSHOT | member-2-shard-entity-ownership-operational:
&#160;fakeSnapshot purging log to 113 for term 1
2019-06-08T04:23:43,283 | DEBUG | opendaylight-cluster-data-shard-dispatcher-46 | EntityOwnershipShard&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 280 - org.opendaylight.controller.sal-clustering-commons - 1.8.3.SNAPSHOT | member-2-shard-entity-ownership-operational:
&#160;Persisting ApplyJournalEntries with index=114
2019-06-08T04:23:44,084 | DEBUG | opendaylight-cluster-data-shard-dispatcher-31 | EntityOwnershipShard&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 280 - org.opendaylight.controller.sal-clustering-commons - 1.8.3.SNAPSHOT | member-2-shard-entity-ownership-operational
(Follower): Commit index set to 115
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;and ODL3 does:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2019-06-08T04:23:43,282 | DEBUG | opendaylight-cluster-data-shard-dispatcher-46 | EntityOwnershipShard&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 280 - org.opendaylight.controller.sal-clustering-commons - 1.8.3.SNAPSHOT | member-2-shard-entity-ownership-operational:
 Applying state for log index 114 data org.opendaylight.controller.cluster.datastore.persisted.CommitTransactionPayload@5478d243
2019-06-08T04:23:43,283 | DEBUG | opendaylight-cluster-data-shard-dispatcher-46 | EntityOwnershipShard&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 280 - org.opendaylight.controller.sal-clustering-commons - 1.8.3.SNAPSHOT | member-2-shard-entity-ownership-operational:
 fakeSnapshot purging log to 113 for term 1
2019-06-08T04:23:43,283 | DEBUG | opendaylight-cluster-data-shard-dispatcher-46 | EntityOwnershipShard&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 280 - org.opendaylight.controller.sal-clustering-commons - 1.8.3.SNAPSHOT | member-2-shard-entity-ownership-operational:
 Persisting ApplyJournalEntries with index=114
2019-06-08T04:23:44,084 | DEBUG | opendaylight-cluster-data-shard-dispatcher-31 | EntityOwnershipShard&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 280 - org.opendaylight.controller.sal-clustering-commons - 1.8.3.SNAPSHOT | member-2-shard-entity-ownership-operational
(Follower): Commit index set to 115
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Overall it seems building consensus on followers takes 0.8 seconds every now and then &#8211; and thus we cannot maintain the expected rate. Note this is operational, so the persist call does not really touch disk.&lt;/p&gt;</comment>
                            <comment id="66862" author="rovarga" created="Mon, 10 Jun 2019 13:25:58 +0000"  >&lt;p&gt;This might be related to the test driver processing, as it would seem that for 0.8 seconds there really is nothing to do on the master.&lt;/p&gt;</comment>
                            <comment id="66864" author="rovarga" created="Mon, 10 Jun 2019 15:38:49 +0000"  >&lt;p&gt;These pauses occur at:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;2019-06-08T04:23:43,282 (114)&lt;/li&gt;
	&lt;li&gt;2019-06-08T04:23:44,307 (137)&lt;/li&gt;
	&lt;li&gt;2019-06-08T04:23:45,342 (160)&lt;/li&gt;
	&lt;li&gt;2019-06-08T04:23:46,389 (183)&lt;/li&gt;
	&lt;li&gt;2019-06-08T04:23:47,422 (206)&lt;/li&gt;
	&lt;li&gt;2019-06-08T04:23:48,455 (229)&lt;/li&gt;
	&lt;li&gt;2019-06-08T04:23:49,606 (252)&lt;/li&gt;
	&lt;li&gt;2019-06-08T04:23:50,561 (275)&lt;/li&gt;
	&lt;li&gt;2019-06-08T04:23:51,642 (298)&lt;/li&gt;
	&lt;li&gt;2019-06-08T04:23:52,659 (321)&lt;/li&gt;
	&lt;li&gt;2019-06-08T04:23:53,654 (344)&lt;/li&gt;
	&lt;li&gt;(to be filled in)&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;The numbers are commit indices, i.e. every 23 commits there seems to be a pause of ~0.8 seconds. A typical non-pause tiime from the above is about 200ms, during which the burst is processed at about 9ms (i.e. at about 111 commits per second), which translates to about 55 flaps a second (I think).&lt;/p&gt;</comment>
                            <comment id="66867" author="ecelgp" created="Mon, 10 Jun 2019 20:26:57 +0000"  >&lt;p&gt;So are we saying the issue is in the test RPC producing the flap?&lt;/p&gt;</comment>
                            <comment id="66868" author="rovarga" created="Mon, 10 Jun 2019 20:50:21 +0000"  >&lt;p&gt;Not sure yet, I&apos;ll have to investigate a bit more. Both &apos;23&apos; and 800ms are quite constant and I don&apos;t remember any CDS/EOS mechanics which would be related to this, so I&apos;ll need to dig through the code and also get debugs from all components in the mix...&lt;/p&gt;</comment>
                            <comment id="66872" author="rovarga" created="Tue, 11 Jun 2019 09:27:00 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jenkins.opendaylight.org/releng/view/controller/job/controller-csit-3node-clustering-ask-all-fluorine/236/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://jenkins.opendaylight.org/releng/view/controller/job/controller-csit-3node-clustering-ask-all-fluorine/236/&lt;/a&gt; is a run with full controller debugs&lt;/p&gt;</comment>
                            <comment id="66875" author="rovarga" created="Tue, 11 Jun 2019 10:24:01 +0000"  >&lt;p&gt;There seems to be a delay in propagating latest state from master (ODL2) to others, where ODL3 is holding owner, hence the delay propagates to ODL3 not relinquishing state.&lt;/p&gt;

&lt;p&gt;ODL1:&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2019-06-11T08:25:19,505 | DEBUG | opendaylight-cluster-data-shard-dispatcher-33 | EntityOwnershipShard&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 280 - org.opendaylight.controller.sal-clustering-commons - 1.8.3.SNAPSHOT | member-1-shard-entity-ownership-operational (Follower): handleAppendEntries: AppendEntries [leaderId=member-2-shard-entity-ownership-operational, prevLogIndex=99, prevLogTerm=1, leaderCommit=99, replicatedToAllIndex=98, payloadVersion=9, recipientRaftVersion=4, leaderRaftVersion=4, leaderAddress=null, entries=[SimpleReplicatedLogEntry [index=100, term=1, payload=org.opendaylight.controller.cluster.datastore.persisted.CommitTransactionPayload@2e722827]]]
2019-06-11T08:25:19,509 | DEBUG | opendaylight-cluster-data-notification-dispatcher-69 | EntityOwnerChangeListener&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 288 - org.opendaylight.controller.sal-distributed-datastore - 1.8.3.SNAPSHOT | member-1-shard-entity-ownership-operational: New owner: member-3, Original owner: member-3
2019-06-11T08:25:19,510 | DEBUG | opendaylight-cluster-data-notification-dispatcher-69 | ShardDataTreeChangePublisherActor | 280 - org.opendaylight.controller.sal-clustering-commons - 1.8.3.SNAPSHOT | member-1-shard-entity-ownership-operational: Elapsed time for generation of change events for member-1-shard-entity-ownership-operational-DTCL-publisher: 1.312 ms
[...]
2019-06-11T08:25:20,247 | DEBUG | opendaylight-cluster-data-shard-dispatcher-47 | EntityOwnershipShard&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 280 - org.opendaylight.controller.sal-clustering-commons - 1.8.3.SNAPSHOT | member-1-shard-entity-ownership-operational (Follower): Commit index set to 100

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;ODL2:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2019-06-11T08:25:19,499 | DEBUG | opendaylight-cluster-data-shard-dispatcher-32 | EntityOwnershipShard&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 280 - org.opendaylight.controller.sal-clustering-commons - 1.8.3.SNAPSHOT | member-2-shard-entity-ownership-operational (Leader): handleAppendEntriesReply - FollowerLogInformation for member-1-shard-e
ntity-ownership-operational updated: matchIndex: 99, nextIndex: 100
2019-06-11T08:25:19,499 | DEBUG | opendaylight-cluster-data-shard-dispatcher-32 | EntityOwnershipShard&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 280 - org.opendaylight.controller.sal-clustering-commons - 1.8.3.SNAPSHOT | member-2-shard-entity-ownership-operational (Leader): Applying to log - commitIndex: 99, lastAppliedIndex: 98
2019-06-11T08:25:19,499 | DEBUG | opendaylight-cluster-data-shard-dispatcher-32 | EntityOwnershipShard&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 280 - org.opendaylight.controller.sal-clustering-commons - 1.8.3.SNAPSHOT | member-2-shard-entity-ownership-operational (Leader): Setting last applied to 99
[...]
2019-06-11T08:25:19,502 | DEBUG | opendaylight-cluster-data-shard-dispatcher-32 | EntityOwnershipShard&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 280 - org.opendaylight.controller.sal-clustering-commons - 1.8.3.SNAPSHOT | member-2-shard-entity-ownership-operational: Persisting ApplyJournalEntries with index=99
[...]
2019-06-11T08:25:19,512 | DEBUG | opendaylight-cluster-data-shard-dispatcher-32 | EntityOwnershipShard&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 280 - org.opendaylight.controller.sal-clustering-commons - 1.8.3.SNAPSHOT | member-2-shard-entity-ownership-operational (Leader): handleAppendEntriesReply - FollowerLogInformation for member-1-shard-entity-ownership-operational updated: matchIndex: 100, nextIndex: 101
[...]
2019-06-11T08:25:19,514 | DEBUG | opendaylight-cluster-data-shard-dispatcher-32 | EntityOwnershipShard&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 280 - org.opendaylight.controller.sal-clustering-commons - 1.8.3.SNAPSHOT | member-2-shard-entity-ownership-operational (Leader): handleAppendEntriesReply - FollowerLogInformation for member-3-shard-entity-ownership-operational updated: matchIndex: 100, nextIndex: 101
[...]
2019-06-11T08:25:19,515 | DEBUG | opendaylight-cluster-data-shard-dispatcher-32 | EntityOwnershipShard&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 280 - org.opendaylight.controller.sal-clustering-commons - 1.8.3.SNAPSHOT | member-2-shard-entity-ownership-operational: Persisting ApplyJournalEntries with index=100
[...]
2019-06-11T08:25:19,517 | DEBUG | opendaylight-cluster-data-notification-dispatcher-53 | ShardDataTreeChangePublisherActor | 280 - org.opendaylight.controller.sal-clustering-commons - 1.8.3.SNAPSHOT | member-2-shard-entity-ownership-operational: Elapsed time for generation of change events for member-2-shard-entity-ownership-operational-DTCL-publisher: 2.777 ms
[...]
2019-06-11T08:25:20,261 | DEBUG | opendaylight-cluster-data-shard-dispatcher-26 | EntityOwnershipShard&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 280 - org.opendaylight.controller.sal-clustering-commons - 1.8.3.SNAPSHOT | member-2-shard-entity-ownership-operational: Applying 1 batched modifications for Tx member-3-entity-ownership-internal-fe-0-txn-14-0

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;ODL3:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2019-06-11T08:25:19,506 | DEBUG | opendaylight-cluster-data-shard-dispatcher-27 | EntityOwnershipShard&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 280 - org.opendaylight.controller.sal-clustering-commons - 1.8.3.SNAPSHOT | member-3-shard-entity-ownership-operational (Follower): handleAppendEntries: AppendEntries [leaderId=member-2-shard-entity-ownership-operational, prevLogIndex=99, prevLogTerm=1, leaderCommit=99, replicatedToAllIndex=98, payloadVersion=9, recipientRaftVersion=4, leaderRaftVersion=4, leaderAddress=null, entries=[SimpleReplicatedLogEntry [index=100, term=1, payload=org.opendaylight.controller.cluster.datastore.persisted.CommitTransactionPayload@1d897866]]]
2019-06-11T08:25:19,510 | DEBUG | opendaylight-cluster-data-notification-dispatcher-66 | EntityOwnerChangeListener&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 288 - org.opendaylight.controller.sal-distributed-datastore - 1.8.3.SNAPSHOT | member-3-shard-entity-ownership-operational: New owner: member-3, Original owner: member-3
2019-06-11T08:25:19,510 | DEBUG | opendaylight-cluster-data-notification-dispatcher-66 | ShardDataTreeChangePublisherActor | 280 - org.opendaylight.controller.sal-clustering-commons - 1.8.3.SNAPSHOT | member-3-shard-entity-ownership-operational: Elapsed time for generation of change events for member-3-shard-entity-ownership-operational-DTCL-publisher: 916.3 &#956;s
[...]
2019-06-11T08:25:20,248 | DEBUG | opendaylight-cluster-data-shard-dispatcher-31 | EntityOwnershipShard&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 280 - org.opendaylight.controller.sal-clustering-commons - 1.8.3.SNAPSHOT | member-3-shard-entity-ownership-operational (Follower): Commit index set to 100
[...]
2019-06-11T08:25:20,250 | DEBUG | opendaylight-cluster-data-notification-dispatcher-66 | EntityOwnerChangeListener&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 288 - org.opendaylight.controller.sal-distributed-datastore - 1.8.3.SNAPSHOT | member-3-shard-entity-ownership-operational: New owner: member-3, Original owner:
[...]
2019-06-11T08:25:20,250 | DEBUG | opendaylight-cluster-data-shard-dispatcher-51 | SyncStatusTracker&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 273 - org.opendaylight.controller.sal-akka-raft - 1.8.3.SNAPSHOT | member-3-shard-car-operational: Lagging 0 entries behind leader and reached member-1-shard-car-operational (of expected -1)
2019-06-11T08:25:20,251 | DEBUG | opendaylight-cluster-data-akka.actor.default-dispatcher-36 | EntityOwnershipListenerActor&#160;&#160;&#160;&#160; | 280 - org.opendaylight.controller.sal-clustering-commons - 1.8.3.SNAPSHOT | Notifying EntityOwnershipListener org.opendaylight.mdsal.singleton.dom.impl.DOMClusterSingletonServiceProviderImpl@2b220cd9: DOMEntityOwnershipChange [entity=DOMEntity [type=org.opendaylight.mdsal.AsyncServiceCloseEntityType, id=/(urn:opendaylight:params:xml:ns:yang:mdsal:core:general-entity?revision=2015-09-30)entity/entity[{(urn:opendaylight:params:xml:ns:yang:mdsal:core:general-entity?revision=2015-09-30)name=flapping-singleton-service}]], state=LOCAL_OWNERSHIP_GRANTED [wasOwner=false, isOwner=true, hasOwner=true], inJeopardy=false]
2019-06-11T08:25:20,252 | DEBUG | opendaylight-cluster-data-akka.actor.default-dispatcher-36 | FlappingSingletonService&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; | 292 - org.opendaylight.controller.samples.clustering-it-provider - 1.8.3.SNAPSHOT | Instantiating flapping-singleton-service.

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="66879" author="rovarga" created="Tue, 11 Jun 2019 14:15:26 +0000"  >&lt;p&gt;Alright, I have figured out what is going on and why things broke with the removal of PurgeTransaction entries.&lt;/p&gt;

&lt;p&gt;State changes here are driven by DTCL, which is hooked onto the replica. That DTCL is driven by Leader&apos;s commitIndex, which is propagated to Followers via AppendEntries.&lt;/p&gt;

&lt;p&gt;In the pause case, ODL2 is master and ODL3 is going to be the new owner. Consensus is established when ODL1 replies to AppendEntries, at which point the leader rolls forward and finishes the transaction. After that the code processing AppendEntriesReply checks if it should send another AppendEntries request. As the journal content has not changed, it will not. A similar thing happens when ODL3 responds.&lt;/p&gt;

&lt;p&gt;The system is in steady state from ODL2&apos;s perspective, but there is still state hanging &#8211; as ODL3&apos;s DTCL has not fired, because ODL3 does not know the transaction has been applied. This continues until the next heartbeat occurs, at which point ODL2 sends an empty AppendEntries message to both ODL1 and ODL3, which contains the updated commitIndex. As the commitIndex is bumped, ODL3 fires the DTCL, which causes more stuff to happen, unblocking the system again.&lt;/p&gt;

&lt;p&gt;Now the reason why this happened to work was that during transaction processing ShardDataTree persisted a PurgeTransactionPayload, hence the check in AppendEntriesReply code saw a new unreplicated entry being present and immediately sent it to ODL1 (and then ODL3 when it came back), reflecting the commitIndex update &#8211; thus the followers fired the DTCL immediately.&lt;/p&gt;</comment>
                            <comment id="66881" author="rovarga" created="Tue, 11 Jun 2019 14:34:01 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.opendaylight.org/secure/ViewProfile.jspa?name=ecelgp&quot; class=&quot;user-hover&quot; rel=&quot;ecelgp&quot;&gt;ecelgp&lt;/a&gt; can you give &lt;a href=&quot;https://git.opendaylight.org/gerrit/82486&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/82486&lt;/a&gt; a try, please?&lt;/p&gt;</comment>
                            <comment id="66884" author="rovarga" created="Tue, 11 Jun 2019 21:30:51 +0000"  >&lt;p&gt;It seems to have been on the right track: &lt;a href=&quot;https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/controller-csit-3node-clustering-ask-all-fluorine/239/robot-plugin/log.html.gz#s1-s14&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/controller-csit-3node-clustering-ask-all-fluorine/239/robot-plugin/log.html.gz#s1-s14&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Latest patch runs here: &lt;a href=&quot;https://jenkins.opendaylight.org/releng/job/controller-patch-test-core-fluorine/2/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://jenkins.opendaylight.org/releng/job/controller-patch-test-core-fluorine/2/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="66887" author="rovarga" created="Tue, 11 Jun 2019 23:40:10 +0000"  >&lt;p&gt;&lt;a href=&quot;https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/controller-csit-3node-clustering-ask-all-fluorine/240/robot-plugin/log.html.gz#s1-s14-t3-k2-k4&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/controller-csit-3node-clustering-ask-all-fluorine/240/robot-plugin/log.html.gz#s1-s14-t3-k2-k4&lt;/a&gt; reports the rate at 84.38 flaps/second, so I think this is fixed.&lt;/p&gt;</comment>
                            <comment id="66888" author="ecelgp" created="Tue, 11 Jun 2019 23:53:15 +0000"  >&lt;p&gt;Yes it is.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10003">
                    <name>Relates</name>
                                                                <inwardlinks description="relates to">
                                        <issuelink>
            <issuekey id="31714">CONTROLLER-1897</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="31735">OPNFLWPLUG-1072</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                            <customfield id="customfield_11400" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10000" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>0|i03o4n:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>