[CONTROLLER-1488] EntityOwnershipListener missing ownershipChanged() callback Created: 24/Feb/16  Updated: 26/May/16  Resolved: 26/May/16

Status: Resolved
Project: controller
Component/s: clustering
Affects Version/s: Beryllium
Fix Version/s: None

Type: Bug
Reporter: Jozef Slezák Assignee: Robert Varga
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Zip Archive 05042016-patchTest-karaf-logs.zip     Zip Archive 24022016-ClusterOwnershipListener.zip     Microsoft Word OwnershipListener-logTimesAndEvents-patch36752-05042016.xlsx     Microsoft Word OwnershipListener-logTimesAndEventsV2.xlsx     Microsoft Word OwnershipListener-logTimesAndEventsV2_public.xlsx     Microsoft Word OwnershipListener-logTimesAndEvents[1].xlsx     Zip Archive clustering-ownership-test-app.zip    
Issue Links:
Blocks
blocks CONTROLLER-1492 Milestone: Cluster-wide service manag... Resolved
blocks MDSAL-158 EntityOwnershipService does not speci... Confirmed
External issue ID: 5414

 Description   

Problem: Our EntityOwnershipListener does not receive all callbacks. We have run several tests and we see unexpected behavior:

1. Some cluster nodes do not receive callback after Controller cold start (this node basically does not know whether it is master or slave) (see log 11:57 at node1)

2. Some cluster nodes do not receive callback (hasOwner=false) after Shard Cluster Leader Node has
been isolated. This is problem because Leader can still think that it is Leader. (see log 12:25 at node2)

This has impact Fail-over Cluster behavior of our custom application and OpenFlow Plugin.

Enviroment:

  • 3 node cluster
  • Ubuntu
  • Beryllium (ZIP downloaded 2016-02-23)
  • Modified existing test application (artifactId=clustering-it-provider groupId=org.opendaylight.controller)

See attached karaf.log, configuration files and source codes.



 Comments   
Comment by Tom Pantelis [ 25/Feb/16 ]

It would help if you could outline detailed reproduction steps for what you're seeing.

Comment by Tibor Sirovatka [ 25/Feb/16 ]

Atatchment contains following files
1/ OwnershipListener-logTimesAndEvents.xlsx
Excel file containing small summary of events on 3 cluster nodes
2/ OwnershipListener-logSummary.log
Extract logs from OwnershipListener test class implementation from all 3 cluster nodes (see marking from the beginning, with ref. to karaf file)
3/ Node1-..., 2, 3 subdirectories
Complete karaf.logs from nodes + grepped OwnershipListener class events (ownl3 - used for 2/ creation )

Comment by Tibor Sirovatka [ 25/Feb/16 ]

Attachment 24022016-ClusterOwnershipListener.zip has been added with description: Karaf logs from test and table sumarizing it

Comment by Tibor Sirovatka [ 25/Feb/16 ]

Hi Tom,
Thanks for quick response. Here comes more details about our tests executed.

Environment:
3 Ubuntu VMs running on Oracle VM Virtual Machine (W7,64,16G RAM), ver. 5.0.14
Each VM running:

Steps done:
1/ Start 3 controllers
2/ install replaced odl-clustering-test-app
3/ wait until everything started, incl. odl-clustering-test-app
4/ run python ./isolate.py 1 (isolate leader)
5/ wait until everything stabilize
6/ rejoin 1
7/ Stop instances

Used also python ./monitor.py to monitor cluster instances
(PS during the test 2 instances had been restarted because of some problems)

Comment by Tibor Sirovatka [ 25/Feb/16 ]

Adapted sample application sources based on
https://github.com/opendaylight/controller/tree/stable/beryllium/opendaylight/md-sal/samples/clustering-test-app
(older release adapted)

Comment by Tibor Sirovatka [ 25/Feb/16 ]

Attachment clustering-ownership-test-app.zip has been added with description: Adapted sample application sources

Comment by Tom Pantelis [ 25/Feb/16 ]

I'm not sure if I completely follow/understand the spreadsheet. From what I can gather, after startup, all 3 nodes registered a candidate and listener and all were notified of ownership state with node1 obtaining ownership of the entity.

Wrt to "isolate", I assume you broke the network connection between node1 and the other 2 nodes causing a network partition. So node1 became isolated and node2 became the new EOS leader and was granted ownership. node 3's listener was also notified of the ownership change.

However node1's listener did not receive a notification that it lost ownership. This is b/c it was an isolated leader and thus can no longer make progress, i.e. it can't commit ownership changes b/c it can't get a majority consensus.

After the network partition was healed, node1 became a follower to node2 and synced up at which point node1's listener was notified of the ownership change(s).

It seems this scenario corresponds to #2 in the bug desc.

I'm unclear as to the rest of sequence in the spreadsheet.

I'm not clear on what #1 means but it sounds like https://git.opendaylight.org/gerrit/#/c/34674/. Do you have that patch?

Comment by Tibor Sirovatka [ 25/Feb/16 ]

Excel file corresponds to manual events and to EntityOwnershipListener notification events.

Excel lines with cell content Leader/Slave corresponds to all kafaf log records corresponding to EntityOwnershipListener notification events. You can also see timestamp from karaf.log overthere.

Log entries are from
OwnershipListener.ownershipChanged(EntityOwnershipChange entityOwnershipChange)
method (entry to it). Additionally you can see also listener registration and closing (Created instance, Close session events)

I haven't applied patch extra, so it depends it is in prebuild Berillium
Uploaded slightly modified excel.

Comment by Tibor Sirovatka [ 25/Feb/16 ]

Attachment OwnershipListener-logTimesAndEvents[1].xlsx has been added with description: Updated excel file (comments)

Comment by Robert Varga [ 25/Feb/16 ]

Tom, that sounds correct. It is true that the minority partition cannot make forward progress when touching internal ODL resources, but EOS is usually used to present a single master to outside resources.

Since the connectivity affects only cluster nodes, an outside entity (like a OF switch) connected to both partitions will see two leaders.

Hence if we know we are in the minority partition, we need to bring the master down. I am not sure if this should be done via a normal 'isOwner=false hasOwner=false' report, or via a more dire means.

Comment by Tom Pantelis [ 25/Feb/16 ]

Yeah that's a hole resulting from using the dataTree/raft to trigger ownership change events. It seems the minority node would have to generate a "fake" event with "wasOwner=true, isOwner=false". However the hasOwner part wouldn't be known. It seems we would need "hasOwner=unknown" state.

I think this may need another type of explicit event for this case. But we're limited with the current API. The new API that was put into mdsal in Be would give us the flexibility to do this. We need to implement the new API in Boron - another thing to put on the list

In terms of knowing if we are in a minority partition may be tricky. If the EOS leader becomes partitioned, we detect that via the IsolatedLeader raft state. However it doesn't know if the majority of followers are actually down or if they're up and there's a network partition. Is it valid to send en event indicating the local node should no longer be master w/o knowing whether or not a majority cluster still remains that will elect a new leader and re-assign ownership?

(In reply to Robert Varga from comment #8)
> Tom, that sounds correct. It is true that the minority partition cannot make
> forward progress when touching internal ODL resources, but EOS is usually
> used to present a single master to outside resources.
>
> Since the connectivity affects only cluster nodes, an outside entity (like a
> OF switch) connected to both partitions will see two leaders.
>
> Hence if we know we are in the minority partition, we need to bring the
> master down. I am not sure if this should be done via a normal
> 'isOwner=false hasOwner=false' report, or via a more dire means.

Comment by Tom Pantelis [ 25/Feb/16 ]

Based on Robert's comment, it looks like we've identified #2.

But I'm still not clear on #1 - need more explanation...

(In reply to Tom Pantelis from comment #5)
> I'm not sure if I completely follow/understand the spreadsheet. From what I
> can gather, after startup, all 3 nodes registered a candidate and listener
> and all were notified of ownership state with node1 obtaining ownership of
> the entity.
>
> Wrt to "isolate", I assume you broke the network connection between node1
> and the other 2 nodes causing a network partition. So node1 became isolated
> and node2 became the new EOS leader and was granted ownership. node 3's
> listener was also notified of the ownership change.
>
> However node1's listener did not receive a notification that it lost
> ownership. This is b/c it was an isolated leader and thus can no longer make
> progress, i.e. it can't commit ownership changes b/c it can't get a majority
> consensus.
>
> After the network partition was healed, node1 became a follower to node2 and
> synced up at which point node1's listener was notified of the ownership
> change(s).
>
> It seems this scenario corresponds to #2 in the bug desc.
>
> I'm unclear as to the rest of sequence in the spreadsheet.
>
> I'm not clear on what #1 means but it sounds like
> https://git.opendaylight.org/gerrit/#/c/34674/. Do you have that patch?

Comment by Tom Pantelis [ 25/Feb/16 ]

I'm having trouble following/understanding the spreadsheet, e.g. layout and terminology. It would be help if you could outline it step by step, i.e. what actions were taken, what occurred and what didn't occur that was expected, e.g.

  • started all 3 nodes
  • node2 got change event: wasOwner: F, isOwner: T, hasOwner: T (expected)
  • node1 got change event: wasOwner: F, isOwner: F, hasOwner: T (expected)
  • stopped the karaf process on node2
  • node1 got change event: wasOwner: F, isOwner: T, hasOwner: T (expected) OR expected node1 change event: wasOwner: F, isOwner: T, hasOwner: T did not occur

(In reply to Tom Pantelis from comment #10)
> Based on Robert's comment, it looks like we've identified #2.
>
> But I'm still not clear on #1 - need more explanation...
>
> (In reply to Tom Pantelis from comment #5)
> > I'm not sure if I completely follow/understand the spreadsheet. From what I
> > can gather, after startup, all 3 nodes registered a candidate and listener
> > and all were notified of ownership state with node1 obtaining ownership of
> > the entity.
> >
> > Wrt to "isolate", I assume you broke the network connection between node1
> > and the other 2 nodes causing a network partition. So node1 became isolated
> > and node2 became the new EOS leader and was granted ownership. node 3's
> > listener was also notified of the ownership change.
> >
> > However node1's listener did not receive a notification that it lost
> > ownership. This is b/c it was an isolated leader and thus can no longer make
> > progress, i.e. it can't commit ownership changes b/c it can't get a majority
> > consensus.
> >
> > After the network partition was healed, node1 became a follower to node2 and
> > synced up at which point node1's listener was notified of the ownership
> > change(s).
> >
> > It seems this scenario corresponds to #2 in the bug desc.
> >
> > I'm unclear as to the rest of sequence in the spreadsheet.
> >
> > I'm not clear on what #1 means but it sounds like
> > https://git.opendaylight.org/gerrit/#/c/34674/. Do you have that patch?

Comment by Tibor Sirovatka [ 25/Feb/16 ]

See Sheet Nr2 listed events in different form
Contains main events from logs (INIT/CLOSE/LEADER/FOLLOW) for each node as they happened. In comment you can also see matching line on LEADER.

( similar to sequence chart )
From the excel you can see also problematic points 1/ and 2/
Hope it helps better, sorry for confutions with original form (the same data)

Comment by Tibor Sirovatka [ 25/Feb/16 ]

Attachment OwnershipListener-logTimesAndEventsV2.xlsx has been added with description: Inserted page with time-sorted events from logs

Comment by Tom Pantelis [ 25/Feb/16 ]

Thanks that helps a lot. Case 1 looks like you're missing https://git.opendaylight.org/gerrit/#/c/34674/. It was merged to stable/be on Feb 19th. I would suggest retrying case 1 with the latest stable/be to ensure you have that patch.

(In reply to Tibor Sirovatka from comment #12)
> Created attachment 853 [details]
> Inserted page with time-sorted events from logs
>
> See Sheet Nr2 listed events in different form
> Contains main events from logs (INIT/CLOSE/LEADER/FOLLOW) for each node as
> they happened. In comment you can also see matching line on LEADER.
>
> ( similar to sequence chart )
> From the excel you can see also problematic points 1/ and 2/
> Hope it helps better, sorry for confutions with original form (the same
> data)

Comment by Tibor Sirovatka [ 26/Feb/16 ]

Can you give me a quick info, how I can check merge is in or not?
( I downloaded prepacked karaf on 23th in the morning)
It would help. Thanks!

(In reply to Tom Pantelis from comment #13)
> Thanks that helps a lot. Case 1 looks like you're missing
> https://git.opendaylight.org/gerrit/#/c/34674/. It was merged to stable/be
> on Feb 19th. I would suggest retrying case 1 with the latest stable/be to
> ensure you have that patch.
>
> (In reply to Tibor Sirovatka from comment #12)
> > Created attachment 853 [details]
> > Inserted page with time-sorted events from logs
> >
> > See Sheet Nr2 listed events in different form
> > Contains main events from logs (INIT/CLOSE/LEADER/FOLLOW) for each node as
> > they happened. In comment you can also see matching line on LEADER.
> >
> > ( similar to sequence chart )
> > From the excel you can see also problematic points 1/ and 2/
> > Hope it helps better, sorry for confutions with original form (the same
> > data)

Comment by Tom Pantelis [ 26/Feb/16 ]

So when you tested you had obtained a build/distro on Feb 23rd... where did you get it from? Was it the Be release distro? If so then Roberts patch wouldn't be in there since it was merged port release.

Comment by Tibor Sirovatka [ 26/Feb/16 ]

Yes, it was path from official announcement, so not there.
( https://www.opendaylight.org/software/downloads/beryllium )

(In reply to Tom Pantelis from comment #15)
> So when you tested you had obtained a build/distro on Feb 23rd... where did
> you get it from? Was it the Be release distro? If so then Roberts patch
> wouldn't be in there since it was merged port release.

Comment by Jozef Slezák [ 01/Mar/16 ]

Tomorrow we are planing to retest using config-parent version=0.4.1-SNAPSHOT to see whether the patch helped. I will give you the update.

Comment by Michal Banik [ 04/Mar/16 ]

retested and problem 1 seems to be solved, but problem 2 is still there

added similar sorted events in excel file - events on 3node cluster and ownership state on nodes

Comment by Michal Banik [ 04/Mar/16 ]

Attachment OwnershipListener-logTimesAndEventsV2_public.xlsx has been added with description: sorted events after retest

Comment by Robert Varga [ 25/Mar/16 ]

The second part requires exposing cluster synchronization status to the application, so it can detect a split-brain situation and act accordingly.

In our RAFT implementation (which backs Entity Ownership Service) this happens during elections (since we do not know who the leader is) and when the leader becomes isolated (since the peers which it cannot reach can perform new elections).

Applications which need to be sure that the cluster is in consistent state for them to operate can choose to suspend execution (without actually stopping) while EOS remains in jeopardy.

master: https://git.opendaylight.org/gerrit/36752

Comment by Colin Dixon [ 05/Apr/16 ]

To b(In reply to Robert Varga from comment #19)
> The second part requires exposing cluster synchronization status to the
> application, so it can detect a split-brain situation and act accordingly.
>
> In our RAFT implementation (which backs Entity Ownership Service) this
> happens during elections (since we do not know who the leader is) and when
> the leader becomes isolated (since the peers which it cannot reach can
> perform new elections).
>
> Applications which need to be sure that the cluster is in consistent state
> for them to operate can choose to suspend execution (without actually
> stopping) while EOS remains in jeopardy.
>
> master: https://git.opendaylight.org/gerrit/36752

To be clear, you never really know when you're in split brain. You only know the last time you were able to make progress or hear about progress. You can decide that if no progress has been made for 30 seconds (or an hour or whatever) that you're in a split-brain mode, but you'll never know.

Adding another mechanism to use/detect this seems dangerous.

Comment by Robert Varga [ 06/Apr/16 ]

I do not disagree, but I think you misunderstand what the patch does, or I misunderstand the point you are making. Since you have made the comments after this has been thoroughly discussed on the Clustering Hackers' call, I would like you to be more explicit about your assumptions and analysis so that we can be sure we are on the same page.

The patch certainly does not introduce another detection mechanism, but rather exposes the confidence of the EOS implementation in the value it is reporting.

The confidence level is directly tied to RAFT (which is an implementation detail and the application has no notion of it existing), e.g. this triggers when we:
a) do not have consensus based on AppendEntries
b) are not sure that our local view represents cluster-wide consensus

Comment by Tibor Sirovatka [ 06/Apr/16 ]

Attachment OwnershipListener-logTimesAndEvents-patch36752-05042016.xlsx has been added with description: Excel with summary from retest with patch applied

Comment by Tibor Sirovatka [ 06/Apr/16 ]

Attachment 05042016-patchTest-karaf-logs.zip has been added with description: karaf.log files from retest with patch applied

Comment by Tibor Sirovatka [ 06/Apr/16 ]

Retested with following artefacts:

1/ base Berilium build (https://github.com/opendaylight/integration-distribution/tree/stable/beryllium/distribution-karaf, 1.6.2-SNAPSHOT )
2/ applied patch files (https://git.opendaylight.org/gerrit/36752)
3/ build and replaced following artefacts in deployment
– system/org/opendaylight/controller/sal-common-api/1.3.2-SNAPSHOT/sal-common-api-1.3.2-SNAPSHOT.jar
– system/org/opendaylight/controller/sal-distributed-datastore/1.3.2-SNAPSHOT/sal-distributed-datastore-1.3.2-SNAPSHOT.jar
4/ updated odl-clustering-test-app and deployed in karaf
4/ set DEBUG info for clustering components
log:set DEBUG org.opendaylight.controller.md.sal.common.api.clustering
log:set DEBUG org.opendaylight.controller.cluster

CASE 1 - OK (as already verified by Michal)
CASE 2 - OK, After isolation, Leader is getting inJeopardy=true state
and becomming IsolatedLeader (still isOwner=true); other 2 controllers are voting for new Leader, one of them becomming new Leader
After rejoin, inJeopardy=false and new Leader is electing from all 3 controllers.

So it seems this case should be handled correctly with inJeopardy state
(signalling just Leader had been isolated)

-----------------------------------------------------------------------------
PS: I need to recheck sync status between cluster-monitor.py and notification event, because it seemed from other test they are not in sync.
Reason can be Shard registered by listener

2016-04-05 11:40:03,516 | DEBUG | lt-dispatcher-17 |
EntityOwnershipShard | 166 - org.opendaylight.controller.sal-akka-
raft - 1.3.2.SNAPSHOT | member-1-shard-entity-ownership-operational: onRegisterListenerLocal: RegisterListenerLocal [entityType=node-cluster-shard-info, listener=org.opendaylight.controller.ownership.OwnershipListener@18327f39]

In case of any problem found will issue separate ticket for it.

Comment by Tibor Sirovatka [ 08/Apr/16 ]

(In reply to Tibor Sirovatka from comment #24)
> Retested with following artefacts:
>
> 1/ base Berilium build
> (https://github.com/opendaylight/integration-distribution/tree/stable/
> beryllium/distribution-karaf, 1.6.2-SNAPSHOT )
> 2/ applied patch files (https://git.opendaylight.org/gerrit/36752)
> 3/ build and replaced following artefacts in deployment
> –
> system/org/opendaylight/controller/sal-common-api/1.3.2-SNAPSHOT/sal-common-
> api-1.3.2-SNAPSHOT.jar
> –
> system/org/opendaylight/controller/sal-distributed-datastore/1.3.2-SNAPSHOT/
> sal-distributed-datastore-1.3.2-SNAPSHOT.jar
> 4/ updated odl-clustering-test-app and deployed in karaf
> 4/ set DEBUG info for clustering components
> log:set DEBUG org.opendaylight.controller.md.sal.common.api.clustering
> log:set DEBUG org.opendaylight.controller.cluster
>
> CASE 1 - OK (as already verified by Michal)
> CASE 2 - OK, After isolation, Leader is getting inJeopardy=true state
> and becomming IsolatedLeader (still isOwner=true); other 2 controllers are
> voting for new Leader, one of them becomming new Leader
> After rejoin, inJeopardy=false and new Leader is electing from all 3
> controllers.
>
> So it seems this case should be handled correctly with inJeopardy state
> (signalling just Leader had been isolated)
>
>
> -----------------------------------------------------------------------------
> PS: I need to recheck sync status between cluster-monitor.py and
> notification event, because it seemed from other test they are not in sync.
> Reason can be Shard registered by listener
>
> 2016-04-05 11:40:03,516 | DEBUG | lt-dispatcher-17 |
> EntityOwnershipShard | 166 -
> org.opendaylight.controller.sal-akka-
> raft - 1.3.2.SNAPSHOT | member-1-shard-entity-ownership-operational:
> onRegisterListenerLocal: RegisterListenerLocal
> [entityType=node-cluster-shard-info,
> listener=org.opendaylight.controller.ownership.OwnershipListener@18327f39]
>
> In case of any problem found will issue separate ticket for it.

Leader status is correct (overlooked another leader change and following notification 5 minutes later). Anyway member-1-shard-entity-ownership-operational is not listed in modules.conf cluster configuration file.

Comment by Tom Pantelis [ 08/Apr/16 ]

That is by design. This shard is dynamically created on startup as it must exist on all nodes.

(In reply to Tibor Sirovatka from comment #25)
> member-1-shard-entity-ownership-operational is not listed in modules.conf
> cluster configuration file.

Comment by Tibor Sirovatka [ 08/Apr/16 ]

Thanks Tom for quick answer ! How can I specify, I am interesting in Leader changes in specific application. (We want to use OwnershipListener as general service for our modules)
Thanks a lot!
(In reply to Tom Pantelis from comment #26)
> That is by design. This shard is dynamically created on startup as it must
> exist on all nodes.
>
> (In reply to Tibor Sirovatka from comment #25)
> > member-1-shard-entity-ownership-operational is not listed in modules.conf
> > cluster configuration file.

Comment by Colin Dixon [ 12/Apr/16 ]

(In reply to Robert Varga from comment #19)
> The second part requires exposing cluster synchronization status to the
> application, so it can detect a split-brain situation and act accordingly.
>
> In our RAFT implementation (which backs Entity Ownership Service) this
> happens during elections (since we do not know who the leader is) and when
> the leader becomes isolated (since the peers which it cannot reach can
> perform new elections).
>
> Applications which need to be sure that the cluster is in consistent state
> for them to operate can choose to suspend execution (without actually
> stopping) while EOS remains in jeopardy.
>
> master: https://git.opendaylight.org/gerrit/36752

As an aside, it seems like "provide hook to detect split brain in entity ownership service" should really be a separate bug from this one.

Generated at Wed Feb 07 19:55:40 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.