[CONTROLLER-2026] Circuit breaker timeout Created: 16/Nov/21  Updated: 02/May/23

Status: Confirmed
Project: controller
Component/s: clustering
Affects Version/s: 4.0.7
Fix Version/s: None

Type: Bug Priority: High
Reporter: Sangwook Ha Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates
relates to CONTROLLER-2043 Circuit breaker timeout with BGP and ... Confirmed

 Description   

OpenFlow Plugin longevity test, which generates a lot of updates in the inventory datastore, is failing after circuit breaker timeout:  

2021-11-16T03:18:22,156 | ERROR | opendaylight-cluster-data-akka.actor.default-dispatcher-15 | Shard                            | 206 - org.opendaylight.controller.repackaged-akka - 4.0.7 | Failed to persist event type [org.opendaylight.controller.cluster.raft.persisted.SimpleReplicatedLogEntry] with sequence number [222242] for persistenceId [member-1-shard-inventory-config].
akka.pattern.CircuitBreaker$$anon$13: Circuit Breaker Timed out.
2021-11-16T03:18:22,167 | INFO  | opendaylight-cluster-data-shard-dispatcher-33 | Shard                            | 211 - org.opendaylight.controller.sal-clustering-commons - 4.0.7 | Stopping Shard member-1-shard-inventory-config
2021-11-16T03:18:22,169 | WARN  | opendaylight-cluster-data-akka.actor.default-dispatcher-14 | LocalThreePhaseCommitCohort      | 213 - org.opendaylight.controller.sal-distributed-datastore - 4.0.7 | Failed to prepare transaction member-1-datastore-config-fe-0-txn-111328-0 on backend
java.lang.RuntimeException: Transaction aborted due to shutdown.
	at org.opendaylight.controller.cluster.datastore.ShardCommitCoordinator.abortPendingTransactions(ShardCommitCoordinator.java:400) ~[bundleFile:?]
	at org.opendaylight.controller.cluster.datastore.Shard.postStop(Shard.java:322) ~[bundleFile:?]
	at akka.actor.Actor.aroundPostStop(Actor.scala:556) ~[bundleFile:?]
	at akka.actor.Actor.aroundPostStop$(Actor.scala:556) ~[bundleFile:?]
	at akka.persistence.AbstractPersistentActor.akka$persistence$Eventsourced$$super$aroundPostStop(PersistentActor.scala:295) ~[bundleFile:?]
	at akka.persistence.Eventsourced.aroundPostStop(Eventsourced.scala:304) ~[bundleFile:?]
	at akka.persistence.Eventsourced.aroundPostStop$(Eventsourced.scala:300) ~[bundleFile:?]
	at akka.persistence.AbstractPersistentActor.aroundPostStop(PersistentActor.scala:295) ~[bundleFile:?]
	at akka.actor.dungeon.FaultHandling.finishTerminate(FaultHandling.scala:240) ~[bundleFile:?]
	at akka.actor.dungeon.FaultHandling.handleChildTerminated(FaultHandling.scala:329) ~[bundleFile:?]
	at akka.actor.dungeon.FaultHandling.handleChildTerminated$(FaultHandling.scala:308) ~[bundleFile:?]
	at akka.actor.ActorCell.handleChildTerminated(ActorCell.scala:411) ~[bundleFile:?]
	at akka.actor.dungeon.DeathWatch.watchedActorTerminated(DeathWatch.scala:102) ~[bundleFile:?]
	at akka.actor.dungeon.DeathWatch.watchedActorTerminated$(DeathWatch.scala:87) ~[bundleFile:?]
	at akka.actor.ActorCell.watchedActorTerminated(ActorCell.scala:411) ~[bundleFile:?]
	at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:513) ~[bundleFile:?]
	at akka.actor.ActorCell.systemInvoke(ActorCell.scala:536) ~[bundleFile:?]
	at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:295) ~[bundleFile:?]
	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:273) ~[bundleFile:?]
	at akka.dispatch.Mailbox.run(Mailbox.scala:231) ~[bundleFile:?]
	at akka.dispatch.Mailbox.exec(Mailbox.scala:243) ~[bundleFile:?]
	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) [?:?]
	at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) [?:?]
	at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) [?:?]
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) [?:?]
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) [?:?]

This happens to all 3 versions - Silicon, Phosphorus & Sulfur.



 Comments   
Comment by Sangwook Ha [ 23/Nov/21 ]

This issue was also found with openflowplugin-csit-1node-scale-link-only-silicon after migrating to RFC8040 URL:

https://s3-logs.opendaylight.org/logs/releng/vex-yul-odl-jenkins-1/openflowplugin-csit-1node-scale-link-only-silicon/505/odl_1/odl1_karaf.log.gz

The max number of links is just 1560 - it used to be 2652 with 52 switches.

https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-scale-link-only-silicon/plot/Link%20Scalability/

Comment by Robert Varga [ 02/May/23 ]

This very much smells like a duplicate of CONTROLLER-2043

Generated at Wed Feb 07 19:57:01 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.