<!-- 
RSS generated by JIRA (8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d) at Wed Feb 07 19:55:40 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>OpenDaylight JIRA</title>
    <link>https://jira.opendaylight.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>8.20.10</version>
        <build-number>820010</build-number>
        <build-date>22-06-2022</build-date>
    </build-info>


<item>
            <title>[CONTROLLER-1488] EntityOwnershipListener missing ownershipChanged() callback</title>
                <link>https://jira.opendaylight.org/browse/CONTROLLER-1488</link>
                <project id="10113" key="CONTROLLER">controller</project>
                    <description>&lt;p&gt;Problem: Our EntityOwnershipListener does not receive all callbacks. We have run several tests and we see unexpected behavior:&lt;/p&gt;

&lt;p&gt;1. Some cluster nodes do not receive callback after Controller cold start (this node basically does not know whether it is master or slave) (see log 11:57 at node1)&lt;/p&gt;

&lt;p&gt;2. Some cluster nodes do not receive callback (hasOwner=false) after Shard Cluster Leader Node has &lt;br/&gt;
been isolated. This is problem because Leader can still think that it is Leader. (see log 12:25 at node2)&lt;/p&gt;


&lt;p&gt;This has impact Fail-over Cluster behavior of our custom application and OpenFlow Plugin.&lt;/p&gt;

&lt;p&gt;Enviroment:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;3 node cluster&lt;/li&gt;
	&lt;li&gt;Ubuntu&lt;/li&gt;
	&lt;li&gt;Beryllium (ZIP downloaded 2016-02-23)&lt;/li&gt;
	&lt;li&gt;Modified existing test application (artifactId=clustering-it-provider groupId=org.opendaylight.controller)&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;See attached karaf.log, configuration files and source codes.&lt;/p&gt;</description>
                <environment>&lt;p&gt;Operating System: All&lt;br/&gt;
Platform: All&lt;/p&gt;</environment>
        <key id="26042">CONTROLLER-1488</key>
            <summary>EntityOwnershipListener missing ownershipChanged() callback</summary>
                <type id="10104" iconUrl="https://jira.opendaylight.org/secure/viewavatar?size=xsmall&amp;avatarId=10303&amp;avatarType=issuetype">Bug</type>
                                                <status id="5" iconUrl="https://jira.opendaylight.org/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="green"/>
                                    <resolution id="10000">Done</resolution>
                                        <assignee username="rovarga">Robert Varga</assignee>
                                    <reporter username="jozef.slezak@pantheon.sk">Jozef Slez&#225;k</reporter>
                        <labels>
                    </labels>
                <created>Wed, 24 Feb 2016 14:59:52 +0000</created>
                <updated>Thu, 26 May 2016 12:37:24 +0000</updated>
                            <resolved>Thu, 26 May 2016 12:37:24 +0000</resolved>
                                    <version>Beryllium</version>
                                                    <component>clustering</component>
                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                                                                <comments>
                            <comment id="51304" author="tpantelis" created="Thu, 25 Feb 2016 01:07:29 +0000"  >&lt;p&gt;It would help if you could outline detailed reproduction steps for what you&apos;re seeing.&lt;/p&gt;</comment>
                            <comment id="51305" author="tsirovat@cisco.com" created="Thu, 25 Feb 2016 08:25:57 +0000"  >&lt;p&gt;Atatchment contains following files&lt;br/&gt;
1/ OwnershipListener-logTimesAndEvents.xlsx&lt;br/&gt;
Excel file containing small summary of events on 3 cluster nodes &lt;br/&gt;
2/ OwnershipListener-logSummary.log&lt;br/&gt;
Extract logs from OwnershipListener test class implementation from all 3 cluster nodes (see marking from the beginning, with ref. to karaf file)&lt;br/&gt;
3/ Node1-..., 2, 3 subdirectories&lt;br/&gt;
Complete karaf.logs from nodes + grepped OwnershipListener class events (ownl3 - used for 2/ creation )&lt;/p&gt;</comment>
                            <comment id="51329" author="tsirovat@cisco.com" created="Thu, 25 Feb 2016 08:25:57 +0000"  >&lt;p&gt;Attachment 24022016-ClusterOwnershipListener.zip has been added with description: Karaf logs from test and table sumarizing it&lt;/p&gt;</comment>
                            <comment id="51306" author="tsirovat@cisco.com" created="Thu, 25 Feb 2016 08:48:17 +0000"  >&lt;p&gt;Hi Tom,&lt;br/&gt;
Thanks for quick response. Here comes more details about our tests executed.&lt;/p&gt;

&lt;p&gt;Environment:&lt;br/&gt;
3 Ubuntu VMs running on Oracle VM Virtual Machine (W7,64,16G RAM), ver. 5.0.14&lt;br/&gt;
Each VM running:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;Ubuntu 14.04.2 LTS (3G RAM)&lt;/li&gt;
	&lt;li&gt;Berilium distribution  (fresh from &lt;a href=&quot;https://www.opendaylight.org/software/downloads/beryllium&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://www.opendaylight.org/software/downloads/beryllium&lt;/a&gt;)&lt;/li&gt;
	&lt;li&gt;extended test application (org.opendaylight.controller:clustering-it) by&lt;br/&gt;
OwnershipListener class&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Steps done:&lt;br/&gt;
1/ Start 3 controllers&lt;br/&gt;
2/ install replaced odl-clustering-test-app&lt;br/&gt;
3/ wait until everything started, incl. odl-clustering-test-app&lt;br/&gt;
4/ run python ./isolate.py 1 (isolate leader)&lt;br/&gt;
5/ wait until everything stabilize&lt;br/&gt;
6/ rejoin 1&lt;br/&gt;
7/ Stop instances&lt;/p&gt;

&lt;p&gt;Used also python ./monitor.py to monitor cluster instances &lt;br/&gt;
(PS during the test 2 instances had been restarted because of some problems)&lt;/p&gt;</comment>
                            <comment id="51307" author="tsirovat@cisco.com" created="Thu, 25 Feb 2016 10:52:06 +0000"  >&lt;p&gt;Adapted sample application sources based on&lt;br/&gt;
&lt;a href=&quot;https://github.com/opendaylight/controller/tree/stable/beryllium/opendaylight/md-sal/samples/clustering-test-app&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/opendaylight/controller/tree/stable/beryllium/opendaylight/md-sal/samples/clustering-test-app&lt;/a&gt;&lt;br/&gt;
(older release adapted)&lt;/p&gt;</comment>
                            <comment id="51330" author="tsirovat@cisco.com" created="Thu, 25 Feb 2016 10:52:06 +0000"  >&lt;p&gt;Attachment clustering-ownership-test-app.zip has been added with description: Adapted sample application sources&lt;/p&gt;</comment>
                            <comment id="51308" author="tpantelis" created="Thu, 25 Feb 2016 13:29:06 +0000"  >&lt;p&gt;I&apos;m not sure if I completely follow/understand the spreadsheet. From what I can gather, after startup, all 3 nodes registered a candidate and listener and all were notified of ownership state with node1 obtaining ownership of the entity. &lt;/p&gt;

&lt;p&gt;Wrt to &quot;isolate&quot;, I assume you broke the network connection between node1 and the other 2 nodes causing a network partition. So node1 became isolated and node2 became the new EOS leader and was granted ownership. node 3&apos;s listener was also notified of the ownership change.&lt;/p&gt;

&lt;p&gt;However node1&apos;s listener did not receive a notification that it lost ownership. This is b/c it was an isolated leader and thus can no longer make progress, i.e. it can&apos;t commit ownership changes b/c it can&apos;t get a majority consensus.&lt;/p&gt;

&lt;p&gt;After the network partition was healed, node1 became a follower to node2 and synced up at which point node1&apos;s listener was notified of the ownership change(s).&lt;/p&gt;

&lt;p&gt;It seems this scenario corresponds to #2 in the bug desc.&lt;/p&gt;

&lt;p&gt;I&apos;m unclear as to the rest of sequence in the spreadsheet.&lt;/p&gt;

&lt;p&gt;I&apos;m not clear on what #1 means but it sounds like &lt;a href=&quot;https://git.opendaylight.org/gerrit/#/c/34674/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/#/c/34674/&lt;/a&gt;. Do you have that patch?&lt;/p&gt;</comment>
                            <comment id="51309" author="tsirovat@cisco.com" created="Thu, 25 Feb 2016 14:31:03 +0000"  >&lt;p&gt;Excel file corresponds to manual events and to EntityOwnershipListener notification events.&lt;/p&gt;

&lt;p&gt;Excel lines with cell content Leader/Slave corresponds to all kafaf log records corresponding to EntityOwnershipListener notification events. You can also see timestamp from karaf.log overthere.&lt;/p&gt;

&lt;p&gt;Log entries are from &lt;br/&gt;
OwnershipListener.ownershipChanged(EntityOwnershipChange entityOwnershipChange)&lt;br/&gt;
method (entry to it). Additionally you can see also listener registration and closing (Created instance, Close session events)&lt;/p&gt;

&lt;p&gt;I haven&apos;t applied patch extra, so it depends it is in prebuild Berillium&lt;br/&gt;
Uploaded slightly modified excel.&lt;/p&gt;</comment>
                            <comment id="51331" author="tsirovat@cisco.com" created="Thu, 25 Feb 2016 14:32:28 +0000"  >&lt;p&gt;Attachment OwnershipListener-logTimesAndEvents&lt;span class=&quot;error&quot;&gt;&amp;#91;1&amp;#93;&lt;/span&gt;.xlsx has been added with description: Updated excel file (comments)&lt;/p&gt;</comment>
                            <comment id="51310" author="rovarga" created="Thu, 25 Feb 2016 14:44:32 +0000"  >&lt;p&gt;Tom, that sounds correct. It is true that the minority partition cannot make forward progress when touching internal ODL resources, but EOS is usually used to present a single master to outside resources.&lt;/p&gt;

&lt;p&gt;Since the connectivity affects only cluster nodes, an outside entity (like a OF switch) connected to both partitions will see two leaders.&lt;/p&gt;

&lt;p&gt;Hence if we know we are in the minority partition, we need to bring the master down. I am not sure if this should be done via a normal &apos;isOwner=false hasOwner=false&apos; report, or via a more dire means.&lt;/p&gt;</comment>
                            <comment id="51311" author="tpantelis" created="Thu, 25 Feb 2016 15:19:17 +0000"  >&lt;p&gt;Yeah that&apos;s a hole resulting from using the dataTree/raft to trigger ownership change events. It seems the minority node would have to generate a &quot;fake&quot; event with &quot;wasOwner=true, isOwner=false&quot;. However the hasOwner part wouldn&apos;t be known. It seems we would need &quot;hasOwner=unknown&quot; state.&lt;/p&gt;

&lt;p&gt;I think this may need another type of explicit event for this case. But we&apos;re limited with the current API. The new API that was put into mdsal in Be would give us the flexibility to do this. We need to implement the new API in Boron - another thing to put on the list &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.opendaylight.org/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;

&lt;p&gt;In terms of knowing if we are in a minority partition may be tricky. If the EOS leader becomes partitioned, we detect that via the IsolatedLeader raft state. However it doesn&apos;t know if the majority of followers are actually down or if they&apos;re up and there&apos;s a network partition. Is it valid to send en event indicating the local node should no longer be master w/o knowing whether or not a majority cluster still remains that will elect a new leader and re-assign ownership?&lt;/p&gt;

&lt;p&gt;(In reply to Robert Varga from comment #8)&lt;br/&gt;
&amp;gt; Tom, that sounds correct. It is true that the minority partition cannot make&lt;br/&gt;
&amp;gt; forward progress when touching internal ODL resources, but EOS is usually&lt;br/&gt;
&amp;gt; used to present a single master to outside resources.&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; Since the connectivity affects only cluster nodes, an outside entity (like a&lt;br/&gt;
&amp;gt; OF switch) connected to both partitions will see two leaders.&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; Hence if we know we are in the minority partition, we need to bring the&lt;br/&gt;
&amp;gt; master down. I am not sure if this should be done via a normal&lt;br/&gt;
&amp;gt; &apos;isOwner=false hasOwner=false&apos; report, or via a more dire means.&lt;/p&gt;</comment>
                            <comment id="51312" author="tpantelis" created="Thu, 25 Feb 2016 15:23:23 +0000"  >&lt;p&gt;Based on Robert&apos;s comment, it looks like we&apos;ve identified #2.&lt;/p&gt;

&lt;p&gt;But I&apos;m still not clear on #1 - need more explanation...&lt;/p&gt;

&lt;p&gt;(In reply to Tom Pantelis from comment #5)&lt;br/&gt;
&amp;gt; I&apos;m not sure if I completely follow/understand the spreadsheet. From what I&lt;br/&gt;
&amp;gt; can gather, after startup, all 3 nodes registered a candidate and listener&lt;br/&gt;
&amp;gt; and all were notified of ownership state with node1 obtaining ownership of&lt;br/&gt;
&amp;gt; the entity. &lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; Wrt to &quot;isolate&quot;, I assume you broke the network connection between node1&lt;br/&gt;
&amp;gt; and the other 2 nodes causing a network partition. So node1 became isolated&lt;br/&gt;
&amp;gt; and node2 became the new EOS leader and was granted ownership. node 3&apos;s&lt;br/&gt;
&amp;gt; listener was also notified of the ownership change.&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; However node1&apos;s listener did not receive a notification that it lost&lt;br/&gt;
&amp;gt; ownership. This is b/c it was an isolated leader and thus can no longer make&lt;br/&gt;
&amp;gt; progress, i.e. it can&apos;t commit ownership changes b/c it can&apos;t get a majority&lt;br/&gt;
&amp;gt; consensus.&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; After the network partition was healed, node1 became a follower to node2 and&lt;br/&gt;
&amp;gt; synced up at which point node1&apos;s listener was notified of the ownership&lt;br/&gt;
&amp;gt; change(s).&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; It seems this scenario corresponds to #2 in the bug desc.&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; I&apos;m unclear as to the rest of sequence in the spreadsheet.&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; I&apos;m not clear on what #1 means but it sounds like&lt;br/&gt;
&amp;gt; &lt;a href=&quot;https://git.opendaylight.org/gerrit/#/c/34674/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/#/c/34674/&lt;/a&gt;. Do you have that patch?&lt;/p&gt;</comment>
                            <comment id="51313" author="tpantelis" created="Thu, 25 Feb 2016 15:32:41 +0000"  >&lt;p&gt;I&apos;m having trouble following/understanding the spreadsheet, e.g. layout and terminology. It would be help if you could outline it step by step, i.e. what actions were taken, what occurred and what didn&apos;t occur that was expected, e.g.&lt;/p&gt;

&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;started all 3 nodes&lt;/li&gt;
	&lt;li&gt;node2 got change event: wasOwner: F, isOwner: T, hasOwner: T (expected)&lt;/li&gt;
	&lt;li&gt;node1 got change event: wasOwner: F, isOwner: F, hasOwner: T (expected)&lt;/li&gt;
	&lt;li&gt;stopped the karaf process on node2&lt;/li&gt;
	&lt;li&gt;node1 got change event: wasOwner: F, isOwner: T, hasOwner: T (expected) OR expected node1 change event: wasOwner: F, isOwner: T, hasOwner: T did not occur&lt;/li&gt;
&lt;/ul&gt;



&lt;p&gt;(In reply to Tom Pantelis from comment #10)&lt;br/&gt;
&amp;gt; Based on Robert&apos;s comment, it looks like we&apos;ve identified #2.&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; But I&apos;m still not clear on #1 - need more explanation...&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; (In reply to Tom Pantelis from comment #5)&lt;br/&gt;
&amp;gt; &amp;gt; I&apos;m not sure if I completely follow/understand the spreadsheet. From what I&lt;br/&gt;
&amp;gt; &amp;gt; can gather, after startup, all 3 nodes registered a candidate and listener&lt;br/&gt;
&amp;gt; &amp;gt; and all were notified of ownership state with node1 obtaining ownership of&lt;br/&gt;
&amp;gt; &amp;gt; the entity. &lt;br/&gt;
&amp;gt; &amp;gt; &lt;br/&gt;
&amp;gt; &amp;gt; Wrt to &quot;isolate&quot;, I assume you broke the network connection between node1&lt;br/&gt;
&amp;gt; &amp;gt; and the other 2 nodes causing a network partition. So node1 became isolated&lt;br/&gt;
&amp;gt; &amp;gt; and node2 became the new EOS leader and was granted ownership. node 3&apos;s&lt;br/&gt;
&amp;gt; &amp;gt; listener was also notified of the ownership change.&lt;br/&gt;
&amp;gt; &amp;gt; &lt;br/&gt;
&amp;gt; &amp;gt; However node1&apos;s listener did not receive a notification that it lost&lt;br/&gt;
&amp;gt; &amp;gt; ownership. This is b/c it was an isolated leader and thus can no longer make&lt;br/&gt;
&amp;gt; &amp;gt; progress, i.e. it can&apos;t commit ownership changes b/c it can&apos;t get a majority&lt;br/&gt;
&amp;gt; &amp;gt; consensus.&lt;br/&gt;
&amp;gt; &amp;gt; &lt;br/&gt;
&amp;gt; &amp;gt; After the network partition was healed, node1 became a follower to node2 and&lt;br/&gt;
&amp;gt; &amp;gt; synced up at which point node1&apos;s listener was notified of the ownership&lt;br/&gt;
&amp;gt; &amp;gt; change(s).&lt;br/&gt;
&amp;gt; &amp;gt; &lt;br/&gt;
&amp;gt; &amp;gt; It seems this scenario corresponds to #2 in the bug desc.&lt;br/&gt;
&amp;gt; &amp;gt; &lt;br/&gt;
&amp;gt; &amp;gt; I&apos;m unclear as to the rest of sequence in the spreadsheet.&lt;br/&gt;
&amp;gt; &amp;gt; &lt;br/&gt;
&amp;gt; &amp;gt; I&apos;m not clear on what #1 means but it sounds like&lt;br/&gt;
&amp;gt; &amp;gt; &lt;a href=&quot;https://git.opendaylight.org/gerrit/#/c/34674/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/#/c/34674/&lt;/a&gt;. Do you have that patch?&lt;/p&gt;</comment>
                            <comment id="51314" author="tsirovat@cisco.com" created="Thu, 25 Feb 2016 19:09:38 +0000"  >&lt;p&gt;See Sheet Nr2 listed events in different form&lt;br/&gt;
Contains main events from logs (INIT/CLOSE/LEADER/FOLLOW) for each node as they happened. In comment you can also see matching line on LEADER.&lt;/p&gt;

&lt;p&gt;( similar to sequence chart )&lt;br/&gt;
From the excel you can see also problematic points 1/ and 2/&lt;br/&gt;
Hope it helps better, sorry for confutions with original form (the same data)&lt;img class=&quot;emoticon&quot; src=&quot;https://jira.opendaylight.org/images/icons/emoticons/sad.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;</comment>
                            <comment id="51332" author="tsirovat@cisco.com" created="Thu, 25 Feb 2016 19:09:38 +0000"  >&lt;p&gt;Attachment OwnershipListener-logTimesAndEventsV2.xlsx has been added with description: Inserted page with time-sorted events from logs&lt;/p&gt;</comment>
                            <comment id="51315" author="tpantelis" created="Thu, 25 Feb 2016 20:28:03 +0000"  >&lt;p&gt;Thanks that helps a lot. Case 1 looks like you&apos;re missing &lt;a href=&quot;https://git.opendaylight.org/gerrit/#/c/34674/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/#/c/34674/&lt;/a&gt;. It was merged to stable/be on Feb 19th. I would suggest retrying case 1 with the latest stable/be to ensure you have that patch.&lt;/p&gt;

&lt;p&gt;(In reply to Tibor Sirovatka from comment #12)&lt;br/&gt;
&amp;gt; Created attachment 853 &lt;span class=&quot;error&quot;&gt;&amp;#91;details&amp;#93;&lt;/span&gt;&lt;br/&gt;
&amp;gt; Inserted page with time-sorted events from logs&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; See Sheet Nr2 listed events in different form&lt;br/&gt;
&amp;gt; Contains main events from logs (INIT/CLOSE/LEADER/FOLLOW) for each node as&lt;br/&gt;
&amp;gt; they happened. In comment you can also see matching line on LEADER.&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; ( similar to sequence chart )&lt;br/&gt;
&amp;gt; From the excel you can see also problematic points 1/ and 2/&lt;br/&gt;
&amp;gt; Hope it helps better, sorry for confutions with original form (the same&lt;br/&gt;
&amp;gt; data)&lt;img class=&quot;emoticon&quot; src=&quot;https://jira.opendaylight.org/images/icons/emoticons/sad.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;</comment>
                            <comment id="51316" author="tsirovat@cisco.com" created="Fri, 26 Feb 2016 07:00:19 +0000"  >&lt;p&gt;Can you give me a quick info, how I can check merge is in or not?&lt;br/&gt;
( I downloaded prepacked karaf on 23th in the morning)&lt;br/&gt;
It would help. Thanks!&lt;/p&gt;

&lt;p&gt;(In reply to Tom Pantelis from comment #13)&lt;br/&gt;
&amp;gt; Thanks that helps a lot. Case 1 looks like you&apos;re missing&lt;br/&gt;
&amp;gt; &lt;a href=&quot;https://git.opendaylight.org/gerrit/#/c/34674/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/#/c/34674/&lt;/a&gt;. It was merged to stable/be&lt;br/&gt;
&amp;gt; on Feb 19th. I would suggest retrying case 1 with the latest stable/be to&lt;br/&gt;
&amp;gt; ensure you have that patch.&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; (In reply to Tibor Sirovatka from comment #12)&lt;br/&gt;
&amp;gt; &amp;gt; Created attachment 853 &lt;span class=&quot;error&quot;&gt;&amp;#91;details&amp;#93;&lt;/span&gt;&lt;br/&gt;
&amp;gt; &amp;gt; Inserted page with time-sorted events from logs&lt;br/&gt;
&amp;gt; &amp;gt; &lt;br/&gt;
&amp;gt; &amp;gt; See Sheet Nr2 listed events in different form&lt;br/&gt;
&amp;gt; &amp;gt; Contains main events from logs (INIT/CLOSE/LEADER/FOLLOW) for each node as&lt;br/&gt;
&amp;gt; &amp;gt; they happened. In comment you can also see matching line on LEADER.&lt;br/&gt;
&amp;gt; &amp;gt; &lt;br/&gt;
&amp;gt; &amp;gt; ( similar to sequence chart )&lt;br/&gt;
&amp;gt; &amp;gt; From the excel you can see also problematic points 1/ and 2/&lt;br/&gt;
&amp;gt; &amp;gt; Hope it helps better, sorry for confutions with original form (the same&lt;br/&gt;
&amp;gt; &amp;gt; data)&lt;img class=&quot;emoticon&quot; src=&quot;https://jira.opendaylight.org/images/icons/emoticons/sad.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;</comment>
                            <comment id="51317" author="tpantelis" created="Fri, 26 Feb 2016 12:48:52 +0000"  >&lt;p&gt;So when you tested you had obtained a build/distro on Feb 23rd... where did you get it from? Was it the Be release distro? If so then Roberts patch wouldn&apos;t be in there since it was merged port release.&lt;/p&gt;</comment>
                            <comment id="51318" author="tsirovat@cisco.com" created="Fri, 26 Feb 2016 17:34:44 +0000"  >&lt;p&gt;Yes, it was path from official announcement, so not there.&lt;br/&gt;
( &lt;a href=&quot;https://www.opendaylight.org/software/downloads/beryllium&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://www.opendaylight.org/software/downloads/beryllium&lt;/a&gt; )&lt;/p&gt;


&lt;p&gt;(In reply to Tom Pantelis from comment #15)&lt;br/&gt;
&amp;gt; So when you tested you had obtained a build/distro on Feb 23rd... where did&lt;br/&gt;
&amp;gt; you get it from? Was it the Be release distro? If so then Roberts patch&lt;br/&gt;
&amp;gt; wouldn&apos;t be in there since it was merged port release.&lt;/p&gt;</comment>
                            <comment id="51319" author="jozef.slezak@pantheon.sk" created="Tue, 1 Mar 2016 15:41:51 +0000"  >&lt;p&gt;Tomorrow we are planing to retest using config-parent version=0.4.1-SNAPSHOT to see whether the patch helped. I will give you the update.&lt;/p&gt;</comment>
                            <comment id="51320" author="michal.banik@pantheon.sk" created="Fri, 4 Mar 2016 18:11:54 +0000"  >&lt;p&gt;retested and problem 1 seems to be solved, but problem 2 is still there&lt;/p&gt;

&lt;p&gt;added similar sorted events in excel file - events on 3node cluster and ownership state on nodes&lt;/p&gt;</comment>
                            <comment id="51333" author="michal.banik@pantheon.sk" created="Fri, 4 Mar 2016 18:11:54 +0000"  >&lt;p&gt;Attachment OwnershipListener-logTimesAndEventsV2_public.xlsx has been added with description: sorted events after retest&lt;/p&gt;</comment>
                            <comment id="51321" author="rovarga" created="Fri, 25 Mar 2016 00:53:27 +0000"  >&lt;p&gt;The second part requires exposing cluster synchronization status to the application, so it can detect a split-brain situation and act accordingly.&lt;/p&gt;

&lt;p&gt;In our RAFT implementation (which backs Entity Ownership Service) this happens during elections (since we do not know who the leader is) and when the leader becomes isolated (since the peers which it cannot reach can perform new elections).&lt;/p&gt;

&lt;p&gt;Applications which need to be sure that the cluster is in consistent state for them to operate can choose to suspend execution (without actually stopping) while EOS remains in jeopardy.&lt;/p&gt;

&lt;p&gt;master: &lt;a href=&quot;https://git.opendaylight.org/gerrit/36752&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/36752&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="51322" author="colin@colindixon.com" created="Tue, 5 Apr 2016 19:43:35 +0000"  >&lt;p&gt;To b(In reply to Robert Varga from comment #19)&lt;br/&gt;
&amp;gt; The second part requires exposing cluster synchronization status to the&lt;br/&gt;
&amp;gt; application, so it can detect a split-brain situation and act accordingly.&lt;br/&gt;
&amp;gt;&lt;br/&gt;
&amp;gt; In our RAFT implementation (which backs Entity Ownership Service) this&lt;br/&gt;
&amp;gt; happens during elections (since we do not know who the leader is) and when&lt;br/&gt;
&amp;gt; the leader becomes isolated (since the peers which it cannot reach can&lt;br/&gt;
&amp;gt; perform new elections).&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; Applications which need to be sure that the cluster is in consistent state&lt;br/&gt;
&amp;gt; for them to operate can choose to suspend execution (without actually&lt;br/&gt;
&amp;gt; stopping) while EOS remains in jeopardy.&lt;br/&gt;
&amp;gt;&lt;br/&gt;
&amp;gt; master: &lt;a href=&quot;https://git.opendaylight.org/gerrit/36752&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/36752&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To be clear, you never really know when you&apos;re in split brain. You only know the last time you were able to make progress or hear about progress. You can decide that if no progress has been made for 30 seconds (or an hour or whatever) that you&apos;re in a split-brain mode, but you&apos;ll never know.&lt;/p&gt;

&lt;p&gt;Adding another mechanism to use/detect this seems dangerous.&lt;/p&gt;</comment>
                            <comment id="51323" author="rovarga" created="Wed, 6 Apr 2016 11:53:58 +0000"  >&lt;p&gt;I do not disagree, but I think you misunderstand what the patch does, or I misunderstand the point you are making. Since you have made the comments after this has been thoroughly discussed on the Clustering Hackers&apos; call, I would like you to be more explicit about your assumptions and analysis so that we can be sure we are on the same page.&lt;/p&gt;

&lt;p&gt;The patch certainly does not introduce another detection mechanism, but rather exposes the confidence of the &lt;b&gt;EOS implementation&lt;/b&gt; in the value it is reporting.&lt;/p&gt;

&lt;p&gt;The confidence level is directly tied to RAFT (which is an implementation detail and the application has no notion of it existing), e.g. this triggers when we:&lt;br/&gt;
a) do not have consensus based on AppendEntries&lt;br/&gt;
b) are not sure that our local view represents cluster-wide consensus&lt;/p&gt;</comment>
                            <comment id="51334" author="tsirovat@cisco.com" created="Wed, 6 Apr 2016 13:10:20 +0000"  >&lt;p&gt;Attachment OwnershipListener-logTimesAndEvents-patch36752-05042016.xlsx has been added with description: Excel with summary from retest with patch applied&lt;/p&gt;</comment>
                            <comment id="51335" author="tsirovat@cisco.com" created="Wed, 6 Apr 2016 13:28:11 +0000"  >&lt;p&gt;Attachment 05042016-patchTest-karaf-logs.zip has been added with description: karaf.log files from retest with patch applied&lt;/p&gt;</comment>
                            <comment id="51324" author="tsirovat@cisco.com" created="Wed, 6 Apr 2016 14:00:07 +0000"  >&lt;p&gt;Retested with following artefacts:&lt;/p&gt;

&lt;p&gt;1/ base Berilium build (&lt;a href=&quot;https://github.com/opendaylight/integration-distribution/tree/stable/beryllium/distribution-karaf&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/opendaylight/integration-distribution/tree/stable/beryllium/distribution-karaf&lt;/a&gt;, 1.6.2-SNAPSHOT )&lt;br/&gt;
2/ applied patch files (&lt;a href=&quot;https://git.opendaylight.org/gerrit/36752&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/36752&lt;/a&gt;)&lt;br/&gt;
3/ build and replaced following artefacts in deployment&lt;br/&gt;
&amp;#8211; system/org/opendaylight/controller/sal-common-api/1.3.2-SNAPSHOT/sal-common-api-1.3.2-SNAPSHOT.jar&lt;br/&gt;
&amp;#8211; system/org/opendaylight/controller/sal-distributed-datastore/1.3.2-SNAPSHOT/sal-distributed-datastore-1.3.2-SNAPSHOT.jar&lt;br/&gt;
4/ updated odl-clustering-test-app and deployed in karaf&lt;br/&gt;
4/ set DEBUG info for clustering components&lt;br/&gt;
log:set DEBUG org.opendaylight.controller.md.sal.common.api.clustering&lt;br/&gt;
log:set DEBUG org.opendaylight.controller.cluster&lt;/p&gt;

&lt;p&gt;CASE 1 - OK (as already verified by Michal)&lt;br/&gt;
CASE 2 - OK, After isolation, Leader is getting inJeopardy=true state&lt;br/&gt;
and becomming IsolatedLeader (still isOwner=true); other 2 controllers are voting for new Leader, one of them becomming new Leader&lt;br/&gt;
After rejoin, inJeopardy=false and new Leader is electing from all 3 controllers.&lt;/p&gt;

&lt;p&gt;So it seems this case should be handled correctly with inJeopardy state&lt;br/&gt;
(signalling just Leader had been isolated)&lt;/p&gt;


&lt;p&gt;-----------------------------------------------------------------------------&lt;br/&gt;
PS: I need to recheck sync status between cluster-monitor.py and notification event, because it seemed from other test they are not in sync.&lt;br/&gt;
Reason can be Shard registered by listener&lt;/p&gt;

&lt;p&gt;2016-04-05 11:40:03,516 | DEBUG | lt-dispatcher-17 | &lt;br/&gt;
EntityOwnershipShard             | 166 - org.opendaylight.controller.sal-akka-&lt;br/&gt;
raft - 1.3.2.SNAPSHOT | member-1-shard-entity-ownership-operational: onRegisterListenerLocal: RegisterListenerLocal &lt;span class=&quot;error&quot;&gt;&amp;#91;entityType=node-cluster-shard-info, listener=org.opendaylight.controller.ownership.OwnershipListener@18327f39&amp;#93;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;In case of any problem found will issue separate ticket for it.&lt;/p&gt;</comment>
                            <comment id="51325" author="tsirovat@cisco.com" created="Fri, 8 Apr 2016 12:19:03 +0000"  >&lt;p&gt;(In reply to Tibor Sirovatka from comment #24)&lt;br/&gt;
&amp;gt; Retested with following artefacts:&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; 1/ base Berilium build&lt;br/&gt;
&amp;gt; (&lt;a href=&quot;https://github.com/opendaylight/integration-distribution/tree/stable/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/opendaylight/integration-distribution/tree/stable/&lt;/a&gt;&lt;br/&gt;
&amp;gt; beryllium/distribution-karaf, 1.6.2-SNAPSHOT )&lt;br/&gt;
&amp;gt; 2/ applied patch files (&lt;a href=&quot;https://git.opendaylight.org/gerrit/36752&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/36752&lt;/a&gt;)&lt;br/&gt;
&amp;gt; 3/ build and replaced following artefacts in deployment&lt;br/&gt;
&amp;gt; &amp;#8211;&lt;br/&gt;
&amp;gt; system/org/opendaylight/controller/sal-common-api/1.3.2-SNAPSHOT/sal-common-&lt;br/&gt;
&amp;gt; api-1.3.2-SNAPSHOT.jar&lt;br/&gt;
&amp;gt; &amp;#8211;&lt;br/&gt;
&amp;gt; system/org/opendaylight/controller/sal-distributed-datastore/1.3.2-SNAPSHOT/&lt;br/&gt;
&amp;gt; sal-distributed-datastore-1.3.2-SNAPSHOT.jar&lt;br/&gt;
&amp;gt; 4/ updated odl-clustering-test-app and deployed in karaf&lt;br/&gt;
&amp;gt; 4/ set DEBUG info for clustering components&lt;br/&gt;
&amp;gt; log:set DEBUG org.opendaylight.controller.md.sal.common.api.clustering&lt;br/&gt;
&amp;gt; log:set DEBUG org.opendaylight.controller.cluster&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; CASE 1 - OK (as already verified by Michal)&lt;br/&gt;
&amp;gt; CASE 2 - OK, After isolation, Leader is getting inJeopardy=true state&lt;br/&gt;
&amp;gt; and becomming IsolatedLeader (still isOwner=true); other 2 controllers are&lt;br/&gt;
&amp;gt; voting for new Leader, one of them becomming new Leader&lt;br/&gt;
&amp;gt; After rejoin, inJeopardy=false and new Leader is electing from all 3&lt;br/&gt;
&amp;gt; controllers.&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; So it seems this case should be handled correctly with inJeopardy state&lt;br/&gt;
&amp;gt; (signalling just Leader had been isolated)&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; -----------------------------------------------------------------------------&lt;br/&gt;
&amp;gt; PS: I need to recheck sync status between cluster-monitor.py and&lt;br/&gt;
&amp;gt; notification event, because it seemed from other test they are not in sync.&lt;br/&gt;
&amp;gt; Reason can be Shard registered by listener&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; 2016-04-05 11:40:03,516 | DEBUG | lt-dispatcher-17 | &lt;br/&gt;
&amp;gt; EntityOwnershipShard             | 166 -&lt;br/&gt;
&amp;gt; org.opendaylight.controller.sal-akka-&lt;br/&gt;
&amp;gt; raft - 1.3.2.SNAPSHOT | member-1-shard-entity-ownership-operational:&lt;br/&gt;
&amp;gt; onRegisterListenerLocal: RegisterListenerLocal&lt;br/&gt;
&amp;gt; [entityType=node-cluster-shard-info,&lt;br/&gt;
&amp;gt; listener=org.opendaylight.controller.ownership.OwnershipListener@18327f39]&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; In case of any problem found will issue separate ticket for it.&lt;/p&gt;

&lt;p&gt;Leader status is correct (overlooked another leader change and following notification 5 minutes later). Anyway member-1-shard-entity-ownership-operational is not listed in modules.conf cluster configuration file.&lt;/p&gt;</comment>
                            <comment id="51326" author="tpantelis" created="Fri, 8 Apr 2016 12:53:37 +0000"  >&lt;p&gt;That is by design. This shard is dynamically created on startup as it must exist on all nodes.&lt;/p&gt;

&lt;p&gt;(In reply to Tibor Sirovatka from comment #25)&lt;br/&gt;
&amp;gt; member-1-shard-entity-ownership-operational is not listed in modules.conf&lt;br/&gt;
&amp;gt; cluster configuration file.&lt;/p&gt;</comment>
                            <comment id="51327" author="tsirovat@cisco.com" created="Fri, 8 Apr 2016 13:15:13 +0000"  >&lt;p&gt;Thanks Tom for quick answer ! How can I specify, I am interesting in Leader changes in specific application. (We want to use OwnershipListener as general service for our modules)&lt;br/&gt;
Thanks a lot!&lt;br/&gt;
(In reply to Tom Pantelis from comment #26)&lt;br/&gt;
&amp;gt; That is by design. This shard is dynamically created on startup as it must&lt;br/&gt;
&amp;gt; exist on all nodes.&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; (In reply to Tibor Sirovatka from comment #25)&lt;br/&gt;
&amp;gt; &amp;gt; member-1-shard-entity-ownership-operational is not listed in modules.conf&lt;br/&gt;
&amp;gt; &amp;gt; cluster configuration file.&lt;/p&gt;</comment>
                            <comment id="51328" author="colin@colindixon.com" created="Tue, 12 Apr 2016 16:49:44 +0000"  >&lt;p&gt;(In reply to Robert Varga from comment #19)&lt;br/&gt;
&amp;gt; The second part requires exposing cluster synchronization status to the&lt;br/&gt;
&amp;gt; application, so it can detect a split-brain situation and act accordingly.&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; In our RAFT implementation (which backs Entity Ownership Service) this&lt;br/&gt;
&amp;gt; happens during elections (since we do not know who the leader is) and when&lt;br/&gt;
&amp;gt; the leader becomes isolated (since the peers which it cannot reach can&lt;br/&gt;
&amp;gt; perform new elections).&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; Applications which need to be sure that the cluster is in consistent state&lt;br/&gt;
&amp;gt; for them to operate can choose to suspend execution (without actually&lt;br/&gt;
&amp;gt; stopping) while EOS remains in jeopardy.&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; master: &lt;a href=&quot;https://git.opendaylight.org/gerrit/36752&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/36752&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As an aside, it seems like &quot;provide hook to detect split brain in entity ownership service&quot; should really be a separate bug from this one.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10000">
                    <name>Blocks</name>
                                            <outwardlinks description="blocks">
                                        <issuelink>
            <issuekey id="26046">CONTROLLER-1492</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="26980">MDSAL-158</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="13574" name="05042016-patchTest-karaf-logs.zip" size="651185" author="tsirovat@cisco.com" created="Wed, 6 Apr 2016 13:28:11 +0000"/>
                            <attachment id="13568" name="24022016-ClusterOwnershipListener.zip" size="247002" author="tsirovat@cisco.com" created="Thu, 25 Feb 2016 08:25:57 +0000"/>
                            <attachment id="13573" name="OwnershipListener-logTimesAndEvents-patch36752-05042016.xlsx" size="16908" author="tsirovat@cisco.com" created="Wed, 6 Apr 2016 13:10:20 +0000"/>
                            <attachment id="13571" name="OwnershipListener-logTimesAndEventsV2.xlsx" size="16480" author="tsirovat@cisco.com" created="Thu, 25 Feb 2016 19:09:38 +0000"/>
                            <attachment id="13572" name="OwnershipListener-logTimesAndEventsV2_public.xlsx" size="11882" author="michal.banik@pantheon.sk" created="Fri, 4 Mar 2016 18:11:54 +0000"/>
                            <attachment id="13570" name="OwnershipListener-logTimesAndEvents[1].xlsx" size="10447" author="tsirovat@cisco.com" created="Thu, 25 Feb 2016 14:32:28 +0000"/>
                            <attachment id="13569" name="clustering-ownership-test-app.zip" size="32914" author="tsirovat@cisco.com" created="Thu, 25 Feb 2016 10:52:06 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                            <customfield id="customfield_11400" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10208" key="com.atlassian.jira.plugin.system.customfieldtypes:textfield">
                        <customfieldname>External issue ID</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>5414</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10201" key="com.atlassian.jira.plugin.system.customfieldtypes:url">
                        <customfieldname>External issue URL</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[https://bugs.opendaylight.org/show_bug.cgi?id=5414]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10206" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Issue Type</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10300"><![CDATA[Bug]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10204" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>ODL SR Target Milestone</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10341"><![CDATA[Beryllium-3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_10000" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>0|i02qvr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>