[MDSAL-195] ClusterSingletonService is not closed after Leader moves to IsolatedLeader Created: 26/Aug/16  Updated: 09/Mar/18  Resolved: 13/Oct/16

Status: Resolved
Project: mdsal
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Martin Mihálek Assignee: Vaclav Demcak
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Text File member1_karaf.log     Text File member2_karaf.log     Text File member3_karaf.log     Zip Archive member_logs.zip     Zip Archive members.zip     Zip Archive members_1.zip    
Issue Links:
Blocks
blocks BGPCEP-540 ClusterSingletonService/EOS race cond... Resolved
blocks OPNFLWPLUG-728 Entity Owner not resolved properly af... Resolved
blocks SXP-100 Add blueprint and clustering support ... Resolved
External issue ID: 6540
Priority: High

 Description   

After isolation of one leader node, new leader is selected and start its services but the old leader does not close its services and becomes unresponsive

2016-08-25 11:30:46,838 | WARN | ult-dispatcher-3 | ShardManager | 169 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Supervisor Strategy caught unexpected exception - resuming
java.lang.IllegalArgumentException: Invalid combination of wasOwner: true, isOwner: true, hasOwner: true
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:145)[38:com.google.guava:18.0.0]
at org.opendaylight.mdsal.eos.common.api.EntityOwnershipChangeState.from(EntityOwnershipChangeState.java:100)[127:org.opendaylight.mdsal.eos-common-api:2.2.0.SNAPSHOT]
at org.opendaylight.controller.cluster.datastore.entityownership.EntityOwnershipListenerSupport.notifyListeners(EntityOwnershipListenerSupport.java:107)[169:org.opendaylight.controller.sal-distributed-datastore:1.5.0.SNAPSHOT]
at org.opendaylight.controller.cluster.datastore.entityownership.EntityOwnershipListenerSupport.notifyListeners(EntityOwnershipListenerSupport.java:100)[169:org.opendaylight.controller.sal-distributed-datastore:1.5.0.SNAPSHOT]
at org.opendaylight.controller.cluster.datastore.entityownership.EntityOwnershipListenerSupport.notifyEntityOwnershipListeners(EntityOwnershipListenerSupport.java:89)[169:org.opendaylight.controller.sal-distributed-datastore:1.5.0.SNAPSHOT]
at org.opendaylight.controller.cluster.datastore.entityownership.EntityOwnershipShard.lambda$notifyAllListeners$1(EntityOwnershipShard.java:290)[169:org.opendaylight.controller.sal-distributed-datastore:1.5.0.SNAPSHOT]
at org.opendaylight.controller.cluster.datastore.entityownership.EntityOwnershipShard.searchForEntities(EntityOwnershipShard.java:475)[169:org.opendaylight.controller.sal-distributed-datastore:1.5.0.SNAPSHOT]
at org.opendaylight.controller.cluster.datastore.entityownership.EntityOwnershipShard.notifyAllListeners(EntityOwnershipShard.java:272)[169:org.opendaylight.controller.sal-distributed-datastore:1.5.0.SNAPSHOT]
at org.opendaylight.controller.cluster.datastore.entityownership.EntityOwnershipShard.onStateChanged(EntityOwnershipShard.java:308)[169:org.opendaylight.controller.sal-distributed-datastore:1.5.0.SNAPSHOT]
at org.opendaylight.controller.cluster.raft.RaftActor.handleBehaviorChange(RaftActor.java:465)[164:org.opendaylight.controller.sal-akka-raft:1.5.0.SNAPSHOT]
at org.opendaylight.controller.cluster.raft.RaftActor.switchBehavior(RaftActor.java:398)[164:org.opendaylight.controller.sal-akka-raft:1.5.0.SNAPSHOT]
at org.opendaylight.controller.cluster.raft.RaftActor.handleCommand(RaftActor.java:297)[164:org.opendaylight.controller.sal-akka-raft:1.5.0.SNAPSHOT]
at org.opendaylight.controller.cluster.common.actor.AbstractUntypedPersistentActor.onReceiveCommand(AbstractUntypedPersistentActor.java:29)[163:org.opendaylight.controller.sal-clustering-commons:1.5.0.SNAPSHOT]
at akka.persistence.UntypedPersistentActor.onReceive(PersistentActor.scala:170)[157:com.typesafe.akka.persistence:2.4.7]
at org.opendaylight.controller.cluster.common.actor.MeteringBehavior.apply(MeteringBehavior.java:97)[163:org.opendaylight.controller.sal-clustering-commons:1.5.0.SNAPSHOT]
at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:544)[151:com.typesafe.akka.actor:2.4.7]
at akka.actor.Actor$class.aroundReceive(Actor.scala:484)[151:com.typesafe.akka.actor:2.4.7]
at akka.persistence.UntypedPersistentActor.akka$persistence$Eventsourced$$super$aroundReceive(PersistentActor.scala:168)[157:com.typesafe.akka.persistence:2.4.7]
at akka.persistence.Eventsourced$$anon$1.stateReceive(Eventsourced.scala:633)[157:com.typesafe.akka.persistence:2.4.7]
at akka.persistence.Eventsourced$class.aroundReceive(Eventsourced.scala:179)[157:com.typesafe.akka.persistence:2.4.7]
at akka.persistence.UntypedPersistentActor.aroundReceive(PersistentActor.scala:168)[157:com.typesafe.akka.persistence:2.4.7]
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)[151:com.typesafe.akka.actor:2.4.7]
at akka.actor.ActorCell.invoke(ActorCell.scala:495)[151:com.typesafe.akka.actor:2.4.7]
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)[151:com.typesafe.akka.actor:2.4.7]
at akka.dispatch.Mailbox.run(Mailbox.scala:224)[151:com.typesafe.akka.actor:2.4.7]
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)[151:com.typesafe.akka.actor:2.4.7]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)[147:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)[147:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)[147:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)[147:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
2016-08-25 11:30:46,841 | WARN | lt-dispatcher-17 | OneForOneStrategy | 152 - com.typesafe.akka.slf4j - 2.4.7 | Invalid combination of wasOwner: true, isOwner: true, hasOwner: true
2016-08-25 11:30:47,346 | WARN | lt-dispatcher-14 | Shard | 164 - org.opendaylight.controller.sal-akka-raft - 1.5.0.SNAPSHOT | member-3-shard-inventory-operational: At least 1 followers need to be active, Switching member-3-shard-inventory-operational from Leader to IsolatedLeader



 Comments   
Comment by Vaclav Demcak [ 26/Aug/16 ]

EntityOwnershipChangeState doesn't support combination wasOwner=true, isOwner=true, hasOwner=true. This message comes because we would like to inform about inJeopardy=true flag. A hotfix could provide new State for EntityOwnershipChangeState LOCAL_OWNERSHIP_GRANTED_NOT_CHANGE, but we have to think about a way to move inJeoparyd flag to EntityOwnershipChangeState.

Comment by Martin Mihálek [ 26/Aug/16 ]

While using 3 node cluster with Cluster singleton service it may get to unresponsive election state after these steps:

1. Isolate current leader (member-3)
2. Member-3 closes its services
3. New Election selects leader (member-1)
4. Member-1 init its services
5. Discard isolation of member-3
6. Member-1 closes its services and only candidate for election is member-1,

And new member cannot be elected thus no services will be brought up from this point.

Comment by Vaclav Demcak [ 26/Aug/16 ]

(In reply to Vaclav Demcak from comment #1)
> EntityOwnershipChangeState doesn't support combination wasOwner=true,
> isOwner=true, hasOwner=true. This message comes because we would like to
> inform about inJeopardy=true flag. A hotfix could provide new State for
> EntityOwnershipChangeState LOCAL_OWNERSHIP_GRANTED_NOT_CHANGE, but we have
> to think about a way to move inJeoparyd flag to EntityOwnershipChangeState.

https://git.opendaylight.org/gerrit/#/c/44673/

Comment by Vaclav Demcak [ 30/Aug/16 ]

Attachment member3_karaf.log has been added with description: IsolatedLeader

Comment by Vaclav Demcak [ 30/Aug/16 ]

Attachment member2_karaf.log has been added with description: member2_follower

Comment by Vaclav Demcak [ 30/Aug/16 ]

Attachment member1_karaf.log has been added with description: member1_follower

Comment by Andrej Leitner [ 31/Aug/16 ]

(In reply to Martin Mihálek from comment #2)
> While using 3 node cluster with Cluster singleton service it may get to
> unresponsive election state after these steps:
>
> 1. Isolate current leader (member-3)
> 2. Member-3 closes its services
> 3. New Election selects leader (member-1)
> 4. Member-1 init its services
> 5. Discard isolation of member-3
> 6. Member-1 closes its services and only candidate for election is member-1,
>
> And new member cannot be elected thus no services will be brought up from
> this point.

Observed the same during testing openflowplugin.

Comment by A H [ 02/Sep/16 ]

Is there an ETA for this bug and someone assigned to fix?

Comment by Colin Dixon [ 02/Sep/16 ]

Talking with TomP, he's said that this issue is complicated and he's already pushed one patch, but he's not certain it can be fixed by the release.

The silver lining is that this bug only relates to when nodes are isolated, which means we could consider changing this to a critical bug instead of a blocker if we are willing to note that during network partitions behavior might be undefined.

Comment by Colin Dixon [ 02/Sep/16 ]

I'm reducing this to critical as TomP also says that this is not a regression from Beryllium.

Comment by Colin Dixon [ 02/Sep/16 ]

Do we think this bug is causing BUG-6554?

Comment by Luis Gomez [ 02/Sep/16 ]

No, this bugs occurs without cluster member isolation and even in single instance test. The bug you point seems to be related to https://bugs.opendaylight.org/show_bug.cgi?id=6177 which is not blocker (major bug).

Comment by Luis Gomez [ 02/Sep/16 ]

Sorry the last comment was intended to the other bug, what I mean is this bug can only be rsponsible for major (not critical) bug: https://bugs.opendaylight.org/show_bug.cgi?id=6177.

Comment by Tom Pantelis [ 02/Sep/16 ]

I've found several issues looking at the logs.

One is that listeners (DTCL and DCL) are not notified when a snapshot is installed by the leader on a follower. This results in member-3 losing its candidate (or not re-gaining) for the ServiceEntityType when the network partition was healed. This looks like a regression in Boron. I have submitted https://git.opendaylight.org/gerrit/#/c/45028/ to fix this.

The fact that the new leader, member-2, tried to install a snapshot is a result of another orthogonal bug that's been there all along (not a regression). The old leader, member-3, had a transaction journal entry at index 6, term 1 that was appended when it was isolated and thus wasn't replicated. Meanwhile, on the other side of the partition, member-2 became leader and committed its own entry at index 6 but with term 2. When the partition healed, member-3 switched to follower as it should. When syncing with the new leader member-2, member-3's entry at index 6 should have immediately been deemed a conflict with member-2's index 6 entry since the terms don't match and member-3's entry should've been removed/replaced by member-2's entry. This did eventually happen (via the snapshot) however member-3's entry was first committed and applied to the state. This is in violation of raft where an entry was committed without being replicated to a majority of the followers. I have an idea on how to fix this.

The major issue that led to all this was a result of the leader removing a member as candidate when it is notified by akka when the member node is determined to be down. This results in another member being selected as owner. I believe initially we just selected a new owner, ignoring the down member. However, if the down member process was actually down this would result in a stale candidate when the member is restarted and no client on that member actually registers a candidate. Therefore the leader now removes the down member as a candidate. To handle the partition case where the member process is still running, when it reconnects it gets the update that its candidate was removed and re-registers it if a local candidate still exists. However this behavior is problematic in the case when the shard leader is isolated. The majority partition will elect a new leader which temporarily results in split-brain and 2 leaders which independently attempt to remove the side's candidates. When the partition is healed, all hell breaks loose trying to reconcile their differences. This is compounded with the singleton service because it uses 2 entities that are related to one another. I'm working on ideas to alleviate this issue.

Also, from the logs, it looks like the internal DTCL's used by the EOS shard didn't get notified of all state transitions. This is because of the batching of updates into a single transaction when one is already in flight. This was done for efficiency but I think state transitions can be lost when combined in the same transaction because of compression in the DataTreeModification. So if a leaf is initially empty, then set to "foo" and then set to "bar" by another operation in the same transaction, the end result is that listeners will only observe the transition from "" -> "bar". I have to verify this.

Comment by Tom Pantelis [ 04/Sep/16 ]

Submitted https://git.opendaylight.org/gerrit/#/c/45129/ to rework the behavior of re-assigning owners on peer node down. This also fixes a couple other issues revealed by the changes and unit tests.

Comment by Viera Zelcamova [ 05/Sep/16 ]

Martin, can you try the patch? Thanks.

Comment by Martin Mihálek [ 05/Sep/16 ]

Used updated sal-distributed-datastore and sal-akka-raft from provided patch,
problem still remains after isolation of node first leader does not close its services, logs attached as "members.zip"

Comment by Martin Mihálek [ 05/Sep/16 ]

Attachment members.zip has been added with description: Logs using patch https://git.opendaylight.org/gerrit/#/c/45129/

Comment by Tom Pantelis [ 06/Sep/16 ]

(In reply to Martin Mihálek from comment #17)
> Used updated sal-distributed-datastore and sal-akka-raft from provided patch,
> problem still remains after isolation of node first leader does not close
> its services, logs attached as "members.zip"

It failed b/c of the original error:

java.lang.IllegalArgumentException: Invalid combination of wasOwner: true, isOwner: true, hasOwner: true
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:145)[38:com.google.guava:18.0.0]
at org.opendaylight.mdsal.eos.common.api.EntityOwnershipChangeState.from(EntityOwnershipChangeState.java:100)[129:org.opendaylight.mdsal.eos-common-api:2.2.0.SNAPSHOT]
at org.opendaylight.controller.cluster.datastore.entityownership.EntityOwnershipListenerSupport.notifyListeners(EntityOwnershipListenerSupport.java:107)[171:org.opendaylight.controller.sal-distributed-datastore:1.5.0.SNAPSHOT]

You must be missing https://git.opendaylight.org/gerrit/#/c/44673/.

With these patches, the scenario of isolating the EOS shard leader with the singleton service entity owned by the shard leader should work. In this case, EOS shard leader and entity owner is member-3. After isolation, either member-1 or member-2 should become owner. After the isolation is removed, nothing should happen and member-1 or member-2 should remain owner.

However, the scenario where member-3 is the EOS shard leader but either member-1 or member-2 is the singleton service entity will probably run into similar issues as before and may not work correctly. This requires more work.

Also the scenario where an EOS shard follower is the singleton service entity and is isolated, it will not get notified of inJeopardy. This was never implemented.

Comment by Ryan Goulding [ 06/Sep/16 ]

Per discussion in Core Projects call, this is being escalated to a blocker status. These patches should be included in Boron.

Comment by Martin Mihálek [ 07/Sep/16 ]

I updated features: sal-akka-raft, sal-distributed-datastore, mdsal-eos-common-api
withc changes: https://git.opendaylight.org/gerrit/#/c/44673/6
https://git.opendaylight.org/gerrit/#/c/45129/4

Tested Situation:

1. Cluster start, member 3 elected as leader
2. Isolate member 3
3. Member 3 closes its services
4. Member 2 elected as new leader
5. Remove isolation of member 3
6. Member 2 still remains as leader
7. Isolate member 2
8. Member does not close its services,
other nodes does not start election
9. Remove isolation of member 2
10. Member 2 closes its services
11. Services on all nodes are closed no leader is elected

Logs attached, using debug on:
org.opendaylight.mdsal.singleton.dom.impl
org.opendaylight.controller.cluster.datastore.entityownership

Comment by Martin Mihálek [ 07/Sep/16 ]

Attachment members_1.zip has been added with description: Cluster members logs

Comment by Tom Pantelis [ 07/Sep/16 ]

>
> Tested Situation:
>
> 1. Cluster start, member 3 elected as leader
> 2. Isolate member 3
> 3. Member 3 closes its services
> 4. Member 2 elected as new leader
> 5. Remove isolation of member 3
> 6. Member 2 still remains as leader'

These steps worked correctly - this is what the patches were intended to address.
Note the use of the term "leader" above really means "entity owner" and not shard leader. We should not mix the 2 terms to avoid confusion. Rewriting the steps above:

1. Cluster start, member 3 elected as EOS shard leader
2. Member 3 chosen as service owner
3. Isolate member 3
4. Member 3 closes its service due to inJeopardy
5. Member 1 elected as new EOS shard leader
6. Member 2 chosen as new service owner
7. Remove isolation of member 3
8. Member 2 remains as service owner

> 7. Isolate member 2
> 8. Member does not close its services,
> other nodes does not start election

Since member 2 is a follower, we don't currently notify of InJeopardy in this case (was never implemented).

Member 3 was actually selected as the new owner of the ServiceEntityType and the DOMClusterSingletonServiceProviderImpl was notified:

2016-09-07 09:56:28,155 | DEBUG | on-dispatcher-55 | EntityOwnerChangeListener | 217 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | member-3-shard-entity-ownership-operational: New owner: member-3, Original owner: member-2

2016-09-07 09:56:28,155 | DEBUG | lt-dispatcher-17 | EntityOwnershipListenerActor | 217 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Notifying EntityOwnershipListener org.opendaylight.mdsal.singleton.dom.impl.DOMClusterSingletonServiceProviderImpl@289870b7: DOMEntityOwnershipChange [entity=DOMEntity [type=org.opendaylight.mdsal.ServiceEntityType, id=/(urn:opendaylight:params:xml:ns:yang:mdsal:core:general-entity?revision=2015-09-30)entity/entity[

{(urn:opendaylight:params:xml:ns:yang:mdsal:core:general-entity?revision=2015-09-30)name=org.opendaylight.controller.config.yang.sxp.controller.conf.SxpControllerInstance}

]], state=LOCAL_OWNERSHIP_GRANTED [wasOwner=false, isOwner=true, hasOwner=true], inJeopardy=false]

However, the ClusterSingletonServiceGroupImpl was not notified. It seem when it loses ownership it stops receiving updates via the DOMClusterSingletonServiceProviderImpl. I think think this is due to removing itself from the allServiceGroups in newAsyncCloseCallback. So this is a bug in the ClusterSingletonServiceGroupImpl - not sure what the intent is - I'll punt to Vaclav...

Comment by A H [ 08/Sep/16 ]

Has this bug been verified as fixed in the latest Boron RC 3.1 Build?

Comment by Tom Pantelis [ 08/Sep/16 ]

(In reply to A H from comment #23)
> Has this bug been verified as fixed in the latest Boron RC 3.1 Build?

It seems what we intended to fix for the RC with the patches has been verified. There's still other scenarios that don't work correctly as I outlined in my comments but we'll address those in SR1. We can downgrade this for critical for now.

Comment by Tom Pantelis [ 13/Sep/16 ]

Submitted
https://git.opendaylight.org/gerrit/#/c/45515/
https://git.opendaylight.org/gerrit/#/c/45516/

to address other issues outlined in previous comments.

Comment by Martin Ciglan [ 19/Sep/16 ]

is this ready for review or still in progress? Thanks for update.

Comment by Tom Pantelis [ 19/Sep/16 ]

(In reply to Martin Ciglan from comment #26)
> is this ready for review or still in progress? Thanks for update.

The patches above have been in review for a few days but no one has reviewed yet. If you want to review that would be great. I intend to backport to stable/boron.

Comment by Martin Mihálek [ 19/Sep/16 ]

I updated features: sal-akka-raft, sal-distributed-datastore

with change: https://git.opendaylight.org/gerrit/#/c/45638/1

Tested Situation:

1. Cluster start, member 1 hosts services
2. Isolate member 1
3. Member 1 closes its services
4. Member 2 elected and hosting services
5. Remove isolation of member 1
6. Member 2 still remains as leader
7. Isolate member 2
8. Member 2 does not close its services,
other nodes does not start election

Logs attached, using debug on:
org.opendaylight.mdsal.singleton.dom.impl
org.opendaylight.controller.cluster.datastore.entityownership

Comment by Martin Mihálek [ 19/Sep/16 ]

Attachment member_logs.zip has been added with description: Cluster member logs

Comment by Tom Pantelis [ 20/Sep/16 ]

(In reply to Martin Mihálek from comment #28)
> Created attachment 1227 [details]
> Cluster member logs
>
> I updated features: sal-akka-raft, sal-distributed-datastore
>
> with change: https://git.opendaylight.org/gerrit/#/c/45638/1
>
> Tested Situation:
>
> 1. Cluster start, member 1 hosts services
> 2. Isolate member 1
> 3. Member 1 closes its services
> 4. Member 2 elected and hosting services
> 5. Remove isolation of member 1
> 6. Member 2 still remains as leader
> 7. Isolate member 2
> 8. Member 2 does not close its services,
> other nodes does not start election
>
> Logs attached, using debug on:
> org.opendaylight.mdsal.singleton.dom.impl
> org.opendaylight.controller.cluster.datastore.entityownership

This is the same scenario tested in Comment 21 (https://bugs.opendaylight.org/show_bug.cgi?id=6540#c21) and same issue I noted in Comment 22 (https://bugs.opendaylight.org/show_bug.cgi?id=6540#c22). Re-assigning to Vaclav.

Comment by Tom Pantelis [ 20/Sep/16 ]

I should mention that https://git.opendaylight.org/gerrit/#/c/45638/ and related patches address this scenario:

1. Cluster start, member 1 elected as EOS shard leader
2. Member 3 chosen as service owner
3. Isolate member 1
4. Member 3 remains as service owner
5. Remove isolation of member 1
6. Member 3 remains as service owner

So in this case, isolating/un-isolating leader member-1 shouldn't result in any service owner changes.

Comment by Michal Rehak [ 27/Sep/16 ]

Vaclav added fix to mdsal:
https://git.opendaylight.org/gerrit/#/c/46175/

Testing results:

  • if shard leader for EOS and service owner (SXP) are co-located then it works as expected:
  • after EOS leader isolation
  • election occurs
  • old leader: Leader -> IsolatedLeader, service is closed
  • new leader: Follower -> Leader, service is instantiated
  • after cluster heal
  • isolated leader: IsolatedLeader -> Follower
  • but if shard leader for EOS and service owner (SXP) are NOT co-located then it behaves like this:
  • after EOS follower isolation
  • old follower: Follower -> Candidate, service is NOT closed
  • another follower or leader: service is instantiated
  • after cluster heal
  • often election occurs
  • isolated follower: Candidate -> Follower, service is closed

In the latter scenario the sxp service was running simultaneously on 2 cluster nodes during isolation.

We also exercised the first scenario by killing karaf instead of touching iptables. Result is the same. So I guess we need to focus on situation where a Follower containing service owner gets isolated and keeps the service active.

Comment by Andrej Leitner [ 10/Oct/16 ]

Hi all,
could anybody review and merge the last Tom's patch
https://git.opendaylight.org/gerrit/#/c/45638

Thanks.

Generated at Wed Feb 07 20:08:58 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.