[OVSDB-292] Clustering: DataStoreUnavailableException when shard leader is killed Created: 08/Feb/16  Updated: 02/Feb/17  Resolved: 02/Feb/17

Status: Resolved
Project: ovsdb
Component/s: Clustering
Affects Version/s: unspecified
Fix Version/s: None

Type: Bug
Reporter: Anil Vishnoi Assignee: ranjithkumar_t
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 5274
Priority: Normal

 Description   

Southbound plugin throws DataStoreUnavailableException while read transaction when shard leader node is killed. This only happens when shard leader node is killed rather then shutting it down gracefully.



 Comments   
Comment by Anil Vishnoi [ 09/Feb/16 ]

Lowering the priority, i am not able to recreate this issue.

Comment by ranjithkumar_t [ 07/Apr/16 ]

Hi Anil,

For recreating the bug the following steps are followed

1. Cloned OVSDB project and switched to stable/beryllium branch.
2. The above step followed for all 3 controllers and copied the cluster configuration file.
3. Installed required features for clustering and southbound plugin.
4. HA happens and initial network topology is initiated in operational side

The following virtual machines are used to create OVSDB 3 node clustering 10.0.0.1, 10.0.0.2, 10.0.0.3

Observation : 1 ( Down Leader node 10.0.0.1) – One Node Failure

1. Leader node is 10.0.0.1 the remaining two nodes are followers.
2. In 3 node cluster, If leader node goes down HA happens for another two nodes and I am not getting any exception like “DataStoreUnavailableException”.
3. There is no change in operational side.

Observation : 2 ( Down Leader node 10.0.0.2) – Two Node Failure

1. Now the Leader node is 10.0.0.2 and the remaining one node is follower.
2. In 2 node cluster, If leader node goes down, I am getting the exception “DataStoreUnavailableException” in operational.

http://10.0.0.3:8181/restconf/operational/network-topology:network-topology/

org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException: Shard member-3-shard-topology-operational currently has no leader. Try again later. at org.opendaylight.controller.cluster.datastore.NoOpTransactionContext.executeRead(NoOpTransactionContext.java:67) ... 23 more Caused by: org.opendaylight.controller.cluster.datastore.exceptions.NoShardLeaderException: Shard member-3-shard-topology-operational currently has no leader. Try again later. at org.opendaylight.controller.cluster.datastore.ShardManager.createNoShardLeaderException(ShardManager.java:744) at org.opendaylight.controller.cluster.datastore.ShardManager.onShardNotInitializedTimeout(ShardManager.java:551)

I am not sure the singleton cluster is supported in 3 node OVSDB clustering mode.

Comment by Jamo Luhrsen [ 03/May/16 ]

I have tried to reproduce this with Beryllium SR2 candidate distro [0].

  • 3 node cluster
  • features: odl-jolokia, odl-ovsdb-southbound-impl-rest
  • mininet topo,1,3 connected and verified in operational/network-topology:network-topology/topology/ovsdb:1/
  • kill -9 on operational topology shard leader

I tried this several times, and never saw the DataStoreUnavailableException. I did see many of
these IllegalStateException.

2016-05-03 17:43:56,597 | INFO | ult-dispatcher-5 | ShardManager | 185 - org.opendaylight.controller.sal-d
istributed-datastore - 1.3.2.Beryllium-SR2 | shard-manager-operational Received follower initial sync status for member-1-shard
-toaster-operational status sync done true
2016-05-03 17:43:56,600 | ERROR | ult-dispatcher-2 | SimpleShardDataTreeCohort | 185 - org.opendaylight.controller.sal-distributed-datastore - 1.3.2.Beryllium-SR2 | Transaction failed to commit
java.lang.IllegalStateException: Store tree org.opendaylight.yangtools.yang.data.api.schema.tree.spi.MaterializedContainerNode@7518b3f0 and candidate base org.opendaylight.yangtools.yang.data.api.schema.tree.spi.MaterializedContainerNode@dc0cde7 differ.
at org.opendaylight.yangtools.yang.data.impl.schema.tree.InMemoryDataTree.commit(InMemoryDataTree.java:119)[100:org.opendaylight.yangtools.yang-data-impl:0.8.2.Beryllium-SR2]
at org.opendaylight.controller.cluster.datastore.SimpleShardDataTreeCohort.commit(SimpleShardDataTreeCohort.java:106)[185:org.opendaylight.controller.sal-distributed-datastore:1.3.2.Beryllium-SR2]
at org.opendaylight.controller.cluster.datastore.ShardCommitCoordinator$CohortEntry.commit(ShardCommitCoordinator.java:670)[185:org.opendaylight.controller.sal-distributed-datastore:1.3.2.Beryllium-SR2]
at org.opendaylight.controller.cluster.datastore.Shard.finishCommit(Shard.java:351)[185:org.opendaylight.controller.sal-distributed-datastore:1.3.2.Beryllium-SR2]
at org.opendaylight.controller.cluster.datastore.Shard.finishCommit(Shard.java:415)[185:org.opendaylight.controller.sal-distributed-datastore:1.3.2.Beryllium-SR2]
at org.opendaylight.controller.cluster.datastore.Shard.applyState(Shard.java:663)[185:org.opendaylight.controller.sal-distributed-datastore:1.3.2.Beryllium-SR2]
at org.opendaylight.controller.cluster.raft.RaftActor.handleCommand(RaftActor.java:224)[182:org.opendaylight.controller.sal-akka-raft:1.3.2.Beryllium-SR2]
at org.opendaylight.controller.cluster.common.actor.AbstractUntypedPersistentActor.onReceiveCommand(AbstractUntypedPersistentActor.java:36)[181:org.opendaylight.controller.sal-clustering-commons:1.3.2.Beryllium-SR2]
at org.opendaylight.controller.cluster.datastore.Shard.onReceiveCommand(Shard.java:275)[185:org.opendaylight.controller.sal-distributed-datastore:1.3.2.Beryllium-SR2]
at akka.persistence.UntypedPersistentActor.onReceive(Eventsourced.scala:430)[175:com.typesafe.akka.persistence.experimental:2.3.14]
at org.opendaylight.controller.cluster.common.actor.MeteringBehavior.apply(MeteringBehavior.java:97)[181:org.opendaylight.controller.sal-clustering-commons:1.3.2.Beryllium-SR2]
at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:534)[170:com.typesafe.akka.actor:2.3.14]
at akka.persistence.Recovery$State$class.process(Recovery.scala:30)[175:com.typesafe.akka.persistence.experimental:2.3.14]
at akka.persistence.ProcessorImpl$$anon$2.process(Processor.scala:103)[175:com.typesafe.akka.persistence.experimental:2.3.14]
at akka.persistence.ProcessorImpl$$anon$2.aroundReceive(Processor.scala:114)[175:com.typesafe.akka.persistence.experimental:2.3.14]
at akka.persistence.Recovery$class.aroundReceive(Recovery.scala:265)[175:com.typesafe.akka.persistence.experimental:2.3.14]
at akka.persistence.UntypedPersistentActor.akka$persistence$Eventsourced$$super$aroundReceive(Eventsourced.scala:428)[175:com.typesafe.akka.persistence.experimental:2.3.14]
at akka.persistence.Eventsourced$$anon$2.doAroundReceive(Eventsourced.scala:82)[175:com.typesafe.akka.persistence.experimental:2.3.14]
at akka.persistence.Eventsourced$$anon$2.aroundReceive(Eventsourced.scala:78)[175:com.typesafe.akka.persistence.experimental:2.3.14]
at akka.persistence.Eventsourced$class.aroundReceive(Eventsourced.scala:369)[175:com.typesafe.akka.persistence.experimental:2.3.14]
at akka.persistence.UntypedPersistentActor.aroundReceive(Eventsourced.scala:428)[175:com.typesafe.akka.persistence.experimental:2.3.14]
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)[170:com.typesafe.akka.actor:2.3.14]
at akka.actor.ActorCell.invoke(ActorCell.scala:487)[170:com.typesafe.akka.actor:2.3.14]
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)[170:com.typesafe.akka.actor:2.3.14]
at akka.dispatch.Mailbox.run(Mailbox.scala:220)[170:com.typesafe.akka.actor:2.3.14]
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)[170:com.typesafe.akka.actor:2.3.14]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)[167:org.scala-lang.scala-library:2.11.7.v20150622-112736-1fbce4612c]
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)[167:org.scala-lang.scala-library:2.11.7.v20150622-112736-1fbce4612c]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)[167:org.scala-lang.scala-library:2.11.7.v20150622-112736-1fbce4612c]
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)[167:org.scala-lang.scala-library:2.11.7.v20150622-112736-1fbce4612c]

[0] https://wiki.opendaylight.org/view/Simultaneous_Release:Beryllium_Release_Plan#Beryllium_SR2_Download

Comment by Anil Vishnoi [ 02/Feb/17 ]

Looks like this issue is not recreatable. Please re-open the issue if you are able to recreate it.

Generated at Wed Feb 07 20:36:01 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.