[CONTROLLER-1673] Leader failed to add shard replica: Error loading snapshot file Created: 15/May/17  Updated: 25/Jul/23  Resolved: 17/May/17

Status: Resolved
Project: controller
Component/s: clustering
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Vratko Polak Assignee: Jakub Morvay
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 8444

 Description   

This has happened [0] in a suite for prefix-based shards when adding previously removed shard replica.

Looking at karaf.log [1] the test started at 06:13:58,161 and there are two errors (aside of the one in the response):

2017-05-14 06:14:33,442 | ERROR | am-dispatcher-69 | LocalSnapshotStore | 192 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.Carbon | Error loading snapshot file snapshots/snapshot-member-3-shard-id-ints%2521-config-31717-1494742432781, remaining attempts: 0
java.io.FileNotFoundException: snapshots/snapshot-member-3-shard-id-ints%2521-config-31717-1494742432781 (No such file or directory)
at java.io.FileInputStream.open0(Native Method)[:1.8.0_121]
at java.io.FileInputStream.open(FileInputStream.java:195)[:1.8.0_121]
at java.io.FileInputStream.<init>(FileInputStream.java:138)[:1.8.0_121]
at org.opendaylight.controller.cluster.persistence.LocalSnapshotStore.tryDeserializeAkkaSnapshot(LocalSnapshotStore.java:156)[192:org.opendaylight.controller.sal-clustering-commons:1.5.0.Carbon]
at org.opendaylight.controller.cluster.persistence.LocalSnapshotStore.deserialize(LocalSnapshotStore.java:143)[192:org.opendaylight.controller.sal-clustering-commons:1.5.0.Carbon]
at org.opendaylight.controller.cluster.persistence.LocalSnapshotStore.doLoad(LocalSnapshotStore.java:119)[192:org.opendaylight.controller.sal-clustering-commons:1.5.0.Carbon]
at org.opendaylight.controller.cluster.persistence.LocalSnapshotStore.lambda$doLoadAsync$0(LocalSnapshotStore.java:109)[192:org.opendaylight.controller.sal-clustering-commons:1.5.0.Carbon]
at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:97)[173:com.typesafe.akka.actor:2.4.17]
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)[173:com.typesafe.akka.actor:2.4.17]
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)[173:com.typesafe.akka.actor:2.4.17]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]

017-05-14 06:14:33,448 | ERROR | lt-dispatcher-32 | Shard | 174 - com.typesafe.akka.slf4j - 2.4.17 | Persistence failure when replaying events for persistenceId [member-3-shard-id-ints!-config]. Last known sequence number [0]
java.io.FileNotFoundException: snapshots/snapshot-member-3-shard-id-ints%2521-config-31717-1494742432781 (No such file or directory)

[0] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/684/archives/log.html.gz#s1-s27-t2-k2-k11-k2-k3-k1-k4-k7-k1
[1] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/684/archives/odl3_karaf.log.gz



 Comments   
Comment by Jakub Morvay [ 15/May/17 ]

https://git.opendaylight.org/gerrit/#/c/57076/

Comment by Jakub Morvay [ 16/May/17 ]

Carbon:
https://git.opendaylight.org/gerrit/#/c/57075

Comment by Vratko Polak [ 16/May/17 ]

The same two exceptions seem to happen also when leader is moved gracefully.

The suite itself fails on NotLeaderException [4], but these exceptions (e.g. at 23:35:45,335 in [5]) might be the reason why new leader failed to settle.

[4] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/686/archives/log.html.gz#s1-s43-t2-k2-k14-k1-k1-k1
[5] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/686/archives/odl3_karaf.log.gz

Comment by Jakub Morvay [ 16/May/17 ]

Increasing priority to blocker, since this does not allow persistence for prefix based shards.

Comment by Peter Gubka [ 17/May/17 ]

Fix worked.
https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/692/archives/log.html.gz#s1-s22-t2

Generated at Wed Feb 07 19:56:09 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.