<!-- 
RSS generated by JIRA (8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d) at Wed Feb 07 19:55:37 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>OpenDaylight JIRA</title>
    <link>https://jira.opendaylight.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>8.20.10</version>
        <build-number>820010</build-number>
        <build-date>22-06-2022</build-date>
    </build-info>


<item>
            <title>[CONTROLLER-1468] [Clustering] Datastore operations failure when leader is down</title>
                <link>https://jira.opendaylight.org/browse/CONTROLLER-1468</link>
                <project id="10113" key="CONTROLLER">controller</project>
                    <description>&lt;p&gt;Test cases:&lt;br/&gt;
===========&lt;br/&gt;
Topology 3 nodes : Leader restart with 1 switches and 1K flows per DPN (add and modify and delete when one node is down)&lt;/p&gt;

&lt;p&gt;Steps:-&lt;br/&gt;
=======&lt;br/&gt;
1. In a 3 node cluster (c1, c2, c3), c2 is Leader and rest are followers&lt;br/&gt;
2. Added 1000 flows via c1 and verified it is replicated all across the nodes.&lt;br/&gt;
3. Made the Leader c2 down, c3 became leader.&lt;br/&gt;
4. Deleted 300 flows from follower c1&lt;br/&gt;
5. Observing 1000 flows in rest of the nodes c1 and c3 when doing GET opertaion on them, Expectation was to have 700 flows in the nodes.&lt;/p&gt;

&lt;p&gt;Please find attached logs of c1 and c3.&lt;/p&gt;

&lt;p&gt;Following similar Log is seen while attempting to delete 300 flows from follower c1 or c3&lt;br/&gt;
=========================================================&lt;br/&gt;
2015-12-14 02:32:44,594 | INFO  | qtp440143820-841 | FlowTesterConcurrentImpl         | 158 - org.opendaylight.openflowplugin.applications.bulk-o-matic - &lt;br/&gt;
0.2.0.SNAPSHOT | Using Concurrent Flow Tester Impl&lt;br/&gt;
2015-12-14 02:32:47,525 | INFO  | Pool-13-worker-1 | FlowTesterConcurrentImpl         | 158 - org.opendaylight.openflowplugin.applications.bulk-o-matic - &lt;br/&gt;
0.2.0.SNAPSHOT | Total Flows read: 1001&lt;br/&gt;
2015-12-14 02:33:01,577 | INFO  | qtp440143820-849 | FlowTesterImpl                   | 158 - org.opendaylight.openflowplugin.applications.bulk-o-matic - &lt;br/&gt;
0.2.0.SNAPSHOT | Using Sequential Flow Tester Impl&lt;br/&gt;
2015-12-14 02:33:06,602 | WARN  | lt-dispatcher-44 | ConcurrentDOMDataBroker          | 138 - org.opendaylight.controller.sal-distributed-datastore - &lt;br/&gt;
1.3.0.SNAPSHOT | Tx: DOM-9707 Error during phase CAN_COMMIT, starting Abort&lt;br/&gt;
akka.pattern.AskTimeoutException: Ask timed out on [ActorSelection[Anchor(akka.tcp://opendaylight-cluster-data@10.183.181.43:2550/), Path&lt;br/&gt;
(/user/shardmanager-config/member-3-shard-inventory-config)]] after &lt;span class=&quot;error&quot;&gt;&amp;#91;5000 ms&amp;#93;&lt;/span&gt;&lt;br/&gt;
        at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334)&lt;span class=&quot;error&quot;&gt;&amp;#91;123:com.typesafe.akka.actor:2.3.14&amp;#93;&lt;/span&gt;&lt;br/&gt;
        at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)&lt;span class=&quot;error&quot;&gt;&amp;#91;123:com.typesafe.akka.actor:2.3.14&amp;#93;&lt;/span&gt;&lt;br/&gt;
        at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;120:org.scala-lang.scala-library:2.10.4.v20140209-180020-VFINAL-b66a39653b&amp;#93;&lt;/span&gt;&lt;br/&gt;
        at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)[120:org.scala-lang.scala-library:2.10.4.v20140209-180020-VFINAL-&lt;br/&gt;
b66a39653b]&lt;br/&gt;
        at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)&lt;span class=&quot;error&quot;&gt;&amp;#91;123:com.typesafe.akka.actor:2.3.14&amp;#93;&lt;/span&gt;&lt;br/&gt;
        at akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)&lt;span class=&quot;error&quot;&gt;&amp;#91;123:com.typesafe.akka.actor:2.3.14&amp;#93;&lt;/span&gt;&lt;br/&gt;
        at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)&lt;span class=&quot;error&quot;&gt;&amp;#91;123:com.typesafe.akka.actor:2.3.14&amp;#93;&lt;/span&gt;&lt;br/&gt;
       at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)&lt;span class=&quot;error&quot;&gt;&amp;#91;123:com.typesafe.akka.actor:2.3.14&amp;#93;&lt;/span&gt;&lt;br/&gt;
        at java.lang.Thread.run(Thread.java:745)&lt;span class=&quot;error&quot;&gt;&amp;#91;:1.7.0_65&amp;#93;&lt;/span&gt;&lt;br/&gt;
2015-12-14 02:33:06,609 | ERROR | CommitFutures-10 | FlowTesterImpl                   | 158 - org.opendaylight.openflowplugin.applications.bulk-o-matic - &lt;br/&gt;
0.2.0.SNAPSHOT | Error: TransactionCommitFailedException&lt;/p&gt;
{message=canCommit encountered an unexpected failure, errorList=[RpcError 
[message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, 
cause=akka.pattern.AskTimeoutException: Ask timed out on [ActorSelection[Anchor(akka.tcp://opendaylight-cluster-data@10.183.181.43:2550/), Path
(/user/shardmanager-config/member-3-shard-inventory-config)]] after [5000 ms]]]}
&lt;p&gt; in Datastore write operation: dpid: 1, tableId: 1, sourceIp: 2&lt;br/&gt;
2015-12-14 02:33:07,975 | INFO  | qtp440143820-849 | FlowTesterConcurrentImpl         | 158 - org.opendaylight.openflowplugin.applications.bulk-o-matic - &lt;br/&gt;
0.2.0.SNAPSHOT | Usin&lt;br/&gt;
g Concurrent Flow Tester Impl&lt;br/&gt;
2015-12-14 02:33:10,772 | INFO  | Pool-15-worker-1 | FlowTesterConcurrentImpl         | 158 - org.opendaylight.openflowplugin.applications.bulk-o-matic - &lt;br/&gt;
0.2.0.SNAPSHOT | Total Flows read: 1000&lt;/p&gt;


&lt;p&gt;Attaching the logs for the remaining nodes when erstwhile leader (controller c2) was brought down and logs are for c3 (new leader) and c1 (follower) nodes&lt;/p&gt;</description>
                <environment>&lt;p&gt;Operating System: All&lt;br/&gt;
Platform: All&lt;/p&gt;</environment>
        <key id="26022">CONTROLLER-1468</key>
            <summary>[Clustering] Datastore operations failure when leader is down</summary>
                <type id="10104" iconUrl="https://jira.opendaylight.org/secure/viewavatar?size=xsmall&amp;avatarId=10303&amp;avatarType=issuetype">Bug</type>
                                                <status id="5" iconUrl="https://jira.opendaylight.org/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="green"/>
                                    <resolution id="10003">Cannot Reproduce</resolution>
                                        <assignee username="-1">Unassigned</assignee>
                                    <reporter username="muthukumaran.k@ericsson.com">Muthukumaran Kothandaraman</reporter>
                        <labels>
                    </labels>
                <created>Mon, 11 Jan 2016 12:37:36 +0000</created>
                <updated>Thu, 19 Oct 2017 21:26:54 +0000</updated>
                            <resolved>Fri, 26 Feb 2016 06:35:32 +0000</resolved>
                                    <version>Beryllium</version>
                                                    <component>clustering</component>
                        <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                                                                <comments>
                            <comment id="51189" author="muthukumaran.k@ericsson.com" created="Mon, 11 Jan 2016 12:37:36 +0000"  >&lt;p&gt;Attachment c1.karaf.rar has been added with description: c1 inventory shard follower log&lt;/p&gt;</comment>
                            <comment id="51190" author="muthukumaran.k@ericsson.com" created="Mon, 11 Jan 2016 12:38:26 +0000"  >&lt;p&gt;Attachment c3.karaf.rar has been added with description: c3 node - new leader after c2 node was brought down&lt;/p&gt;</comment>
                            <comment id="51185" author="tpantelis" created="Tue, 19 Jan 2016 21:16:27 +0000"  >&lt;p&gt;In c3, it looks like c2 (10.183.181.42) was taken down at this point:&lt;/p&gt;

&lt;p&gt;2015-12-14 02:16:01,780 | WARN  | lt-dispatcher-49 | ClusterCoreDaemon                | 124 - com.typesafe.akka.slf4j - 2.3.14 | Cluster Node &lt;span class=&quot;error&quot;&gt;&amp;#91;akka.tcp://opendaylight-cluster-data@10.183.181.43:2550&amp;#93;&lt;/span&gt; - Marking node(s) as UNREACHABLE &lt;span class=&quot;error&quot;&gt;&amp;#91;Member(address = akka.tcp://opendaylight-cluster-data@10.183.181.42:2550, status = Up)&amp;#93;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;c3 became leader at:&lt;/p&gt;

&lt;p&gt;2015-12-14 02:16:07,452 | INFO  | lt-dispatcher-40 | ShardManager                     | 138 - org.opendaylight.controller.sal-distributed-datastore - 1.3.0.SNAPSHOT | shard-manager-config: Received role changed for member-3-shard-inventory-config from Candidate to Leader&lt;/p&gt;

&lt;p&gt;Some 2 minutes later a tx timed out:&lt;/p&gt;

&lt;p&gt;2015-12-14 02:18:52,127 | WARN  | lt-dispatcher-48 | Shard                            | 135 - org.opendaylight.controller.sal-akka-raft - 1.3.0.SNAPSHOT | member-3-shard-inventory-config: Current transaction member-1-txn-6406 has timed out after 15000 ms - aborting&lt;/p&gt;

&lt;p&gt;It&apos;s hard to tell why w/o debugging.&lt;/p&gt;

&lt;p&gt;In step #4, were the 300 deletes batched in 1 tx or done in 300 tx&apos;s?&lt;br/&gt;
Is it reproducible with only 1 flow deleted? Or 10 etc?&lt;/p&gt;

&lt;p&gt;It would be helpful to reproduce with debug enabled just prior to doing the deletes. In the new leader (c3), enable org.opendaylight.controller.cluster.datastore.Shard and in the remote node initiating the deletes (c1), enable org.opendaylight.controller.cluster.RemoteTransactionContext and org.opendaylight.controller.cluster.SingleCommitCohortProxy.&lt;/p&gt;</comment>
                            <comment id="51186" author="jbehran@cisco.com" created="Fri, 19 Feb 2016 12:19:54 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.opendaylight.org/browse/CONTROLLER-1486&quot; title=&quot;Clustering: Datastore may fail with &amp;quot;Shard XXX has no leader. Try again later&amp;quot;&quot; class=&quot;issue-link&quot; data-issue-key=&quot;CONTROLLER-1486&quot;&gt;&lt;del&gt;CONTROLLER-1486&lt;/del&gt;&lt;/a&gt; might be an incarnation of this bug. In that case the leader appears to be not elected yet when the datastore operation failed.&lt;/p&gt;</comment>
                            <comment id="51187" author="muthukumaran.k@ericsson.com" created="Thu, 25 Feb 2016 11:31:16 +0000"  >&lt;p&gt;We do not use the switches as this is mainly to study datastore behavior just that Openflowplugin Inventory model is used. &lt;/p&gt;

&lt;p&gt;With latest stable/beryllium build, we are not able to reproduce this exact scenario. Tried it once. We will try this with more repetitions to see if this has really gone. &lt;/p&gt;

&lt;p&gt;Will come to a conclusion latest by tomorrow to close this bug&lt;/p&gt;</comment>
                            <comment id="51188" author="muthukumaran.k@ericsson.com" created="Fri, 26 Feb 2016 06:35:32 +0000"  >&lt;p&gt;This is NOT reproducible in latest stable/beryllium - so closing the same.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="13562" name="c1.karaf.rar" size="14761" author="muthukumaran.k@ericsson.com" created="Mon, 11 Jan 2016 12:37:36 +0000"/>
                            <attachment id="13563" name="c3.karaf.rar" size="7854" author="muthukumaran.k@ericsson.com" created="Mon, 11 Jan 2016 12:38:26 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                            <customfield id="customfield_11400" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10208" key="com.atlassian.jira.plugin.system.customfieldtypes:textfield">
                        <customfieldname>External issue ID</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>4923</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10201" key="com.atlassian.jira.plugin.system.customfieldtypes:url">
                        <customfieldname>External issue URL</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[https://bugs.opendaylight.org/show_bug.cgi?id=4923]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10000" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>0|i02qrb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>