<!-- 
RSS generated by JIRA (8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d) at Wed Feb 07 19:56:03 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>OpenDaylight JIRA</title>
    <link>https://jira.opendaylight.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>8.20.10</version>
        <build-number>820010</build-number>
        <build-date>22-06-2022</build-date>
    </build-info>


<item>
            <title>[CONTROLLER-1632] IOException in initiateCaptureSnapshot</title>
                <link>https://jira.opendaylight.org/browse/CONTROLLER-1632</link>
                <project id="10113" key="CONTROLLER">controller</project>
                    <description>&lt;p&gt;odl: distribution-karaf-0.6.0-20170411.050836-4837.zip&lt;/p&gt;


&lt;p&gt;The test scenario follows the steps of DOMDataBroker testing: Leader Isolation: partition heals within transaction timeout&lt;br/&gt;
1)  start test-transaction producer, running at 1K tps, non-overlapping, from all nodes to a single shard&lt;br/&gt;
2)  Isolate leader&lt;br/&gt;
3)  Wait for followers to initiate election&lt;br/&gt;
4)  Un-isolate leader&lt;br/&gt;
5)  Wait for partition to heal  - this was done within transaction timeout (30s)&lt;/p&gt;

&lt;p&gt;After the followers elected new leader, the node was rejoined (node unblocked via iptables).&lt;br/&gt;
Then the test waited for all 3 nodes to be reported correctly. Few seconds after rejoin the node still reported IsolatedLeader raft state and then&lt;br/&gt;
started to respond with http status 401 (unauthorized).&lt;br/&gt;
This took&lt;/p&gt;</description>
                <environment>&lt;p&gt;Operating System: All&lt;br/&gt;
Platform: All&lt;/p&gt;</environment>
        <key id="26186">CONTROLLER-1632</key>
            <summary>IOException in initiateCaptureSnapshot</summary>
                <type id="10104" iconUrl="https://jira.opendaylight.org/secure/viewavatar?size=xsmall&amp;avatarId=10303&amp;avatarType=issuetype">Bug</type>
                                                <status id="10004" iconUrl="https://jira.opendaylight.org/images/icons/status_generic.gif" description="">Verified</status>
                    <statusCategory id="3" key="done" colorName="green"/>
                                    <resolution id="10000">Done</resolution>
                                        <assignee username="-1">Unassigned</assignee>
                                    <reporter username="pgubka@cisco.com">Peter Gubka</reporter>
                        <labels>
                    </labels>
                <created>Tue, 11 Apr 2017 10:00:26 +0000</created>
                <updated>Tue, 25 Jul 2023 08:24:29 +0000</updated>
                            <resolved>Fri, 21 Apr 2017 12:37:10 +0000</resolved>
                                                                    <component>clustering</component>
                        <due></due>
                            <votes>0</votes>
                                    <watches>2</watches>
                                                                                                                <comments>
                            <comment id="51971" author="pgubka@cisco.com" created="Tue, 11 Apr 2017 10:00:26 +0000"  >&lt;p&gt;Attachment log.html has been added with description: robot log&lt;/p&gt;</comment>
                            <comment id="51972" author="pgubka@cisco.com" created="Tue, 11 Apr 2017 10:00:51 +0000"  >&lt;p&gt;Attachment karaf_log_1.tar.gz has been added with description: karaf log&lt;/p&gt;</comment>
                            <comment id="51973" author="pgubka@cisco.com" created="Tue, 11 Apr 2017 10:01:13 +0000"  >&lt;p&gt;Attachment karaf_log_2.tar.gz has been added with description: karaf log 2&lt;/p&gt;</comment>
                            <comment id="51974" author="pgubka@cisco.com" created="Tue, 11 Apr 2017 10:01:53 +0000"  >&lt;p&gt;Attachment karaf_log_3.tar.gz has been added with description: karaf log 3&lt;/p&gt;</comment>
                            <comment id="51965" author="tpantelis" created="Tue, 11 Apr 2017 11:33:34 +0000"  >&lt;p&gt;Which shard are you looking at? In looking at the default-config shard, I see that node1 was the leader and was isolated:&lt;/p&gt;

&lt;p&gt;2017-04-11 08:47:33,069 | INFO  | ult-dispatcher-4 | Shard                            | 226 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.SNAPSHOT | member-1-shard-default-config (Leader) :- Switching from behavior Leader to IsolatedLeader, election term: 2&lt;/p&gt;

&lt;p&gt;Meanwhile node3 became leader on the other side of the partition with term 3:&lt;/p&gt;

&lt;p&gt;2017-04-11 08:47:28,644 | INFO  | lt-dispatcher-42 | ShardManager                     | 233 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | shard-manager-config: Received role changed for member-3-shard-default-config from PreLeader to Leader&lt;/p&gt;

&lt;p&gt;node1 switched to follower at:&lt;/p&gt;

&lt;p&gt;2017-04-11 08:47:38,854 | INFO  | lt-dispatcher-29 | Shard                            | 226 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.SNAPSHOT | member-1-shard-default-config (IsolatedLeader): Term 3 in &quot;AppendEntriesReply &lt;span class=&quot;error&quot;&gt;&amp;#91;term=3, success=false, followerId=member-2-shard-default-config, logLastIndex=20, logLastTerm=3, forceInstallSnapshot=false, payloadVersion=5, raftVersion=3&amp;#93;&lt;/span&gt;&quot; message is greater than leader&apos;s term 2 - switching to Follower&lt;/p&gt;

&lt;p&gt;2017-04-11 08:47:38,855 | INFO  | lt-dispatcher-29 | Shard                            | 226 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.SNAPSHOT | member-1-shard-default-config (IsolatedLeader) :- Switching from behavior IsolatedLeader to Follower, election term: 3&lt;/p&gt;

&lt;p&gt;New leader node3 then installed a snapshot on node1 to catch it up:&lt;/p&gt;

&lt;p&gt;2017-04-11 08:47:40,693 | INFO  | lt-dispatcher-34 | Shard                            | 226 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.SNAPSHOT | member-3-shard-default-config (Leader): handleAppendEntriesReply: follower member-1-shard-default-config lastIndex 24 is ahead of our lastIndex 20 (snapshotIndex 17) - forcing install snaphot&lt;/p&gt;

&lt;p&gt;2017-04-11 08:47:40,758 | INFO  | lt-dispatcher-83 | Shard                            | 226 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.SNAPSHOT | member-1-shard-default-config: Applying snapshot on follower:  Snapshot &lt;span class=&quot;error&quot;&gt;&amp;#91;lastIndex=20, lastTerm=3, lastAppliedIndex=20, lastAppliedTerm=3, unAppliedEntries size=0, state=org.opendaylight.controller.cluster.datastore.persisted.ShardSnapshotState@51af049b, electionTerm=3, electionVotedFor=null, ServerConfigPayload=null&amp;#93;&lt;/span&gt;&lt;/p&gt;


&lt;p&gt;I don&apos;t see a problem here.&lt;/p&gt;</comment>
                            <comment id="51966" author="pgubka@cisco.com" created="Tue, 11 Apr 2017 12:13:25 +0000"  >&lt;p&gt;To fix my the problem description:&lt;br/&gt;
after the rejoin the the test again iterates over all nodes and verifies the cluster status.&lt;br/&gt;
Some time after the rejoin it reported the the of the node as IsolatedLeader. After some time Follower status should be reported. Instead of that the node started to respond with 401 (probably to anything). In the attached test case it responded 401 to GET for /restconf/modules.&lt;/p&gt;

&lt;p&gt;I changed the bug description too.&lt;/p&gt;</comment>
                            <comment id="51967" author="tpantelis" created="Tue, 11 Apr 2017 14:05:46 +0000"  >&lt;p&gt;(In reply to Peter Gubka from comment #5)&lt;br/&gt;
&amp;gt; To fix my the problem description:&lt;br/&gt;
&amp;gt; after the rejoin the the test again iterates over all nodes and verifies the&lt;br/&gt;
&amp;gt; cluster status.&lt;br/&gt;
&amp;gt; Some time after the rejoin it reported the the of the node as&lt;br/&gt;
&amp;gt; IsolatedLeader. After some time Follower status should be reported. Instead&lt;br/&gt;
&amp;gt; of that the node started to respond with 401 (probably to anything). In the&lt;br/&gt;
&amp;gt; attached test case it responded 401 to GET for /restconf/modules.&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; I changed the bug description too.&lt;/p&gt;

&lt;p&gt;As I mentioned, from the logs the default config shard did transition from IsolatedLeader to Follower as expected. A 401 response means &quot;Unauthorized&quot; which means the credentials were rejected. This would not be an issue with clustering. You need to look at the robot logs to see what it&apos;s sending and to which node.&lt;/p&gt;

&lt;p&gt;Later on in the log for node3, I see this error repeated a few times:&lt;/p&gt;

&lt;p&gt;2017-04-11 08:47:56,268 | INFO  | lt-dispatcher-33 | Shard                            | 226 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.SNAPSHOT | member-3-shard-default-config (Leader): handleAppendEntriesReply: follower member-1-shard-default-config lastIndex 24 is ahead of our lastIndex 21 (snapshotIndex 17) - forcing install snaphot&lt;br/&gt;
2017-04-11 08:47:56,419 | WARN  | lt-dispatcher-80 | ShardManager                     | 233 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Supervisor Strategy caught unexpected exception - resuming&lt;br/&gt;
java.lang.RuntimeException: java.io.IOException: The # of bytes read from the imput stream, -1, does not match the expected # 3075&lt;br/&gt;
        at com.google.common.base.Throwables.propagate(Throwables.java:160)&lt;span class=&quot;error&quot;&gt;&amp;#91;84:com.google.guava:18.0.0&amp;#93;&lt;/span&gt;&lt;br/&gt;
        at org.opendaylight.controller.cluster.raft.behaviors.AbstractLeader.sendSnapshotChunk(AbstractLeader.java:831)&lt;span class=&quot;error&quot;&gt;&amp;#91;227:org.opendaylight.controller.sal-akka-raft:1.5.0.SNAPSHOT&amp;#93;&lt;/span&gt;&lt;br/&gt;
        at org.opendaylight.controller.cluster.raft.behaviors.AbstractLeader.initiateCaptureSnapshot(AbstractLeader.java:745)&lt;span class=&quot;error&quot;&gt;&amp;#91;227:org.opendaylight.controller.sal-akka-raft:1.5.0.SNAPSHOT&amp;#93;&lt;/span&gt;&lt;br/&gt;
        at org.opendaylight.controller.cluster.raft.behaviors.AbstractLeader.handleAppendEntriesReply(AbstractLeader.java:238)&lt;span class=&quot;error&quot;&gt;&amp;#91;227:org.opendaylight.controller.sal-akka-raft:1.5.0.SNAPSHOT&amp;#93;&lt;/span&gt;&lt;br/&gt;
        at org.opendaylight.controller.cluster.raft.behaviors.Leader.handleAppendEntriesReply(Leader.java:101)&lt;span class=&quot;error&quot;&gt;&amp;#91;227:org.opendaylight.controller.sal-akka-raft:1.5.0.SNAPSHOT&amp;#93;&lt;/span&gt;&lt;br/&gt;
        at org.opendaylight.controller.cluster.raft.behaviors.AbstractRaftActorBehavior.handleMessage(AbstractRaftActorBehavior.java:411)&lt;span class=&quot;error&quot;&gt;&amp;#91;227:org.opendaylight.controller.sal-akka-raft:1.5.0.SNAPSHOT&amp;#93;&lt;/span&gt;&lt;br/&gt;
        at org.opendaylight.controller.cluster.raft.behaviors.AbstractLeader.handleMessage(AbstractLeader.java:469)&lt;span class=&quot;error&quot;&gt;&amp;#91;227:org.opendaylight.controller.sal-akka-raft:1.5.0.SNAPSHOT&amp;#93;&lt;/span&gt;&lt;br/&gt;
        at org.opendaylight.controller.cluster.raft.behaviors.Leader.handleMessage(Leader.java:80)&lt;span class=&quot;error&quot;&gt;&amp;#91;227:org.opendaylight.controller.sal-akka-raft:1.5.0.SNAPSHOT&amp;#93;&lt;/span&gt;&lt;br/&gt;
        at org.opendaylight.controller.cluster.raft.RaftActor.possiblyHandleBehaviorMessage(RaftActor.java:322)&lt;span class=&quot;error&quot;&gt;&amp;#91;227:org.opendaylight.controller.sal-akka-raft:1.5.0.SNAPSHOT&amp;#93;&lt;/span&gt;&lt;br/&gt;
        at org.opendaylight.controller.cluster.raft.RaftActor.handleCommand(RaftActor.java:271)&lt;span class=&quot;error&quot;&gt;&amp;#91;227:org.opendaylight.controller.sal-akka-raft:1.5.0.SNAPSHOT&amp;#93;&lt;/span&gt;&lt;br/&gt;
...&lt;br/&gt;
Caused by: java.io.IOException: The # of bytes read from the imput stream, -1, does not match the expected # 3075&lt;br/&gt;
        at org.opendaylight.controller.cluster.raft.behaviors.LeaderInstallSnapshotState.getNextChunk(LeaderInstallSnapshotState.java:129)&lt;span class=&quot;error&quot;&gt;&amp;#91;227:org.opendaylight.controller.sal-akka-raft:1.5.0.SNAPSHOT&amp;#93;&lt;/span&gt;&lt;br/&gt;
        at org.opendaylight.controller.cluster.raft.behaviors.AbstractLeader.sendSnapshotChunk(AbstractLeader.java:804)&lt;span class=&quot;error&quot;&gt;&amp;#91;227:org.opendaylight.controller.sal-akka-raft:1.5.0.SNAPSHOT&amp;#93;&lt;/span&gt;&lt;/p&gt;


&lt;p&gt;I think what happened is that when node1 was isolated, node3 tries to continuously send AppendEntries at the heartbeat interval so multiple messages are queued up and eventually get sent to node1 which results in multiple replies in quick succession where the follower&apos;s last index is higher and it tries to force a snapshot install. The first reply sends the snapshot chunk and drains the stream which causes the error when the second reply tries to send the same chunk. &lt;/p&gt;

&lt;p&gt;This error causes the shard actor to restart but I think this is orthogonal and wouldn&apos;t result in a 401 on the HTTP front-end, unless AAA was trying to access the credentials from the datastore but I don&apos;t think that info is stored there.&lt;/p&gt;</comment>
                            <comment id="51968" author="tpantelis" created="Tue, 11 Apr 2017 14:09:06 +0000"  >&lt;p&gt;I also noticed multiple messages like:&lt;/p&gt;

&lt;p&gt;2017-04-11 08:48:07,057 | INFO  | lt-dispatcher-16 | Shard                            | 226 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.SNAPSHOT | member-3-shard-default-config (Leader): follower member-1-shard-default-config last log term 2 conflicts with the leader&apos;s 3 - dec next index to -2&lt;/p&gt;

&lt;p&gt;I think this is benign. I pushed &lt;a href=&quot;https://git.opendaylight.org/gerrit/#/c/54780&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/#/c/54780&lt;/a&gt; to prevent decrementing next index below -1.&lt;/p&gt;</comment>
                            <comment id="51969" author="tpantelis" created="Wed, 12 Apr 2017 02:50:09 +0000"  >&lt;p&gt;It seems the 401 issue is more general and occurs with other tests as well. &lt;a href=&quot;https://jira.opendaylight.org/browse/NETCONF-398&quot; title=&quot;Carbon: odl respond with http status 401 in various suites&quot; class=&quot;issue-link&quot; data-issue-key=&quot;NETCONF-398&quot;&gt;&lt;del&gt;NETCONF-398&lt;/del&gt;&lt;/a&gt; was opened for that. I&apos;ll use this bug to address the IOException during install snapshot.&lt;/p&gt;</comment>
                            <comment id="51970" author="tpantelis" created="Wed, 12 Apr 2017 23:58:42 +0000"  >&lt;p&gt;Submitted &lt;a href=&quot;https://git.opendaylight.org/gerrit/#/c/54898/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/#/c/54898/&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="13649" name="karaf_log_1.tar.gz" size="66080" author="pgubka@cisco.com" created="Tue, 11 Apr 2017 10:00:51 +0000"/>
                            <attachment id="13650" name="karaf_log_2.tar.gz" size="66640" author="pgubka@cisco.com" created="Tue, 11 Apr 2017 10:01:13 +0000"/>
                            <attachment id="13651" name="karaf_log_3.tar.gz" size="65752" author="pgubka@cisco.com" created="Tue, 11 Apr 2017 10:01:53 +0000"/>
                            <attachment id="13648" name="log.html" size="346931" author="pgubka@cisco.com" created="Tue, 11 Apr 2017 10:00:26 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                            <customfield id="customfield_11400" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10208" key="com.atlassian.jira.plugin.system.customfieldtypes:textfield">
                        <customfieldname>External issue ID</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>8206</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10201" key="com.atlassian.jira.plugin.system.customfieldtypes:url">
                        <customfieldname>External issue URL</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[https://bugs.opendaylight.org/show_bug.cgi?id=8206]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10206" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Issue Type</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10300"><![CDATA[Bug]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10204" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>ODL SR Target Milestone</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10336"><![CDATA[Carbon-RC0]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_10000" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>0|i02rrr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>