[CONTROLLER-2036] Failure of initial removal of candidates from previous iteration Created: 05/Apr/22 Updated: 06/Apr/22 Resolved: 06/Apr/22 |
|
| Status: | Resolved |
| Project: | controller |
| Component/s: | clustering |
| Affects Version/s: | 5.0.1 |
| Fix Version/s: | 5.0.2 |
| Type: | Bug | Priority: | Medium |
| Reporter: | Sangwook Ha | Assignee: | Sangwook Ha |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Description |
|
Found in openflowplugin-csit-3node-clustering-bulkomatic-only-sulfur/191/ After cluster is restarted, one instance generates the following warning message repeatedly for about 5 minutes until the leader is killed in a different test case.: 2022-04-05T02:58:18,683 | WARN | opendaylight-cluster-data-akka.actor.default-dispatcher-45 | CandidateRegistryInit | 202 - org.opendaylight.controller.eos-dom-akka - 5.0.1 | member-3 : Initial removal of candidates from previous iteration failed. Rescheduling. java.util.concurrent.TimeoutException: Ask timed out on [Actor[akka://opendaylight-cluster-data/system/singletonProxyOwnerSupervisor-no-dc#2033451887]] after [5000 ms]. Message of type [org.opendaylight.controller.eos.akka.owner.supervisor.command.ClearCandidatesForMember]. A typical reason for `AskTimeoutException` is that the recipient actor didn't send a reply. at akka.actor.typed.scaladsl.AskPattern$.$anonfun$onTimeout$1(AskPattern.scala:131) ~[bundleFile:?] at akka.pattern.PromiseActorRef$.$anonfun$apply$1(AskSupport.scala:730) ~[bundleFile:?] at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:479) ~[bundleFile:?] at scala.concurrent.ExecutionContext$parasitic$.execute(ExecutionContext.scala:222) ~[bundleFile:?] at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:365) ~[bundleFile:?] at akka.actor.LightArrayRevolverScheduler$$anon$3.executeBucket$1(LightArrayRevolverScheduler.scala:314) ~[bundleFile:?] at akka.actor.LightArrayRevolverScheduler$$anon$3.nextTick(LightArrayRevolverScheduler.scala:318) ~[bundleFile:?] at akka.actor.LightArrayRevolverScheduler$$anon$3.run(LightArrayRevolverScheduler.scala:270) ~[bundleFile:?] at java.lang.Thread.run(Thread.java:829) ~[?:?] |
| Comments |
| Comment by Robert Varga [ 06/Apr/22 ] |
|
Hmm, this looks rather weird. We are getting these: 2022-04-05T03:03:15,442 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-44 | LocalActorRef | 206 - org.opendaylight.controller.repackaged-akka - 5.0.1 | Message [akka.actor.ReceiveTimeout$] to Actor[akka://opendaylight-cluster-data/system/IO-TCP/selectors/$a/9#-1115372167] was not delivered. [295] dead letters encountered, of which 284 were not logged. The counter will be reset now. If this is not an expected behavior then Actor[akka://opendaylight-cluster-data/system/IO-TCP/selectors/$a/9#-1115372167] may have terminated unexpectedly. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. Hunting through int/test, it seems openflowplugin using is using an ancient akka.conf.template. Perhaps it needs an update? |
| Comment by Sangwook Ha [ 06/Apr/22 ] |
|
The template file is in the tools directory, outside of csit, and I think it's not really used for the test suite. |
| Comment by Robert Varga [ 06/Apr/22 ] |
|
Interesting, I wonder where is that TCP reference coming from. |
| Comment by Robert Varga [ 06/Apr/22 ] |
2022-04-05T02:58:10,693 | ERROR | opendaylight-cluster-data-akka.actor.default-dispatcher-32 | Behavior$ | 206 - org.opendaylight.controller.repackaged-akka - 5.0.1 | Supervisor StopSupervisor saw failure: Ask timed out on [Actor[akka://opendaylight-cluster-data/system/typedDdataReplicator#380268694]] after [5000 ms]. Message of type [akka.cluster.ddata.typed.javadsl.Replicator$Get]. A typical reason for `AskTimeoutException` is that the recipient actor didn't send a reply. java.util.concurrent.TimeoutException: Ask timed out on [Actor[akka://opendaylight-cluster-data/system/typedDdataReplicator#380268694]] after [5000 ms]. Message of type [akka.cluster.ddata.typed.javadsl.Replicator$Get]. A typical reason for `AskTimeoutException` is that the recipient actor didn't send a reply. at akka.actor.typed.scaladsl.AskPattern$.$anonfun$onTimeout$1(AskPattern.scala:131) ~[bundleFile:?] at akka.pattern.PromiseActorRef$.$anonfun$apply$1(AskSupport.scala:730) ~[bundleFile:?] at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:479) ~[bundleFile:?] at scala.concurrent.ExecutionContext$parasitic$.execute(ExecutionContext.scala:222) ~[bundleFile:?] at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:365) ~[bundleFile:?] at akka.actor.LightArrayRevolverScheduler$$anon$3.executeBucket$1(LightArrayRevolverScheduler.scala:314) ~[bundleFile:?] at akka.actor.LightArrayRevolverScheduler$$anon$3.nextTick(LightArrayRevolverScheduler.scala:318) ~[bundleFile:?] at akka.actor.LightArrayRevolverScheduler$$anon$3.run(LightArrayRevolverScheduler.scala:270) ~[bundleFile:?] at java.lang.Thread.run(Thread.java:829) ~[?:?] 2022-04-05T02:58:10,700 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-14 | ClusterSingletonManager | 206 - org.opendaylight.controller.repackaged-akka - 5.0.1 | Singleton actor [akka://opendaylight-cluster-data/system/singletonManagerOwnerSupervisor/OwnerSupervisor] was terminated |