[CONTROLLER-2076] Two-Node Cluster Fails to Install Snapshot on Clean Follower Node Created: 17/Apr/23 Updated: 22/Jan/24 |
|
| Status: | In Progress |
| Project: | controller |
| Component/s: | clustering |
| Affects Version/s: | Sodium SR4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Medium |
| Reporter: | Peter Suna | Assignee: | Ivan Hrasko |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | pt | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Description |
|
The two-node cluster fails to start when the leader has stored snapshot data and the follower is started as a clean instance without snapshot data. The leader node starts correctly, but the follower node encounters a repeating issue, which is displayed as: 2023-04-14T13:40:29,326 | WARN | Thread-35 | AbstractShardBackendResolver | 219 - org.opendaylight.controller.sal-distributed-datastore - 1.10.4 | Failed to resolve shard java.util.concurrent.TimeoutException: Connection attempt failed at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractShardBackendResolver.wrap(AbstractShardBackendResolver.java:151) ~[219:org.opendaylight.controller.sal-distributed-datastore:1.10.4] at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractShardBackendResolver.onConnectResponse(AbstractShardBackendResolver.java:168) ~[219:org.opendaylight.controller.sal-distributed-datastore:1.10.4] at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractShardBackendResolver.lambda$connectShard$4(AbstractShardBackendResolver.java:161) ~[219:org.opendaylight.controller.sal-distributed-datastore:1.10.4] at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) [?:?] at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) [?:?] at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478) [?:?] at java.lang.Thread.run(Thread.java:829) [?:?] Caused by: org.opendaylight.controller.cluster.access.concepts.RetiredGenerationException: Originating generation 0 was superseded by 2 This issue has been observed in the Sodium-SR4 release and needs to be verified if it is present in the current master. To build the environment where I observed this issue, I used the integration-distribution repository with Java 11. Steps to reproduce the issue are as follows: (Prepare environment) curl --request POST 'http://192.168.56.101:8181/rests/data/car:cars' \ --header 'Authorization: Basic YWRtaW46YWRtaW4=' \ --header 'Content-Type: application/json' \ --data '{ "car-entry": [ { "id": "id-'"$id"'-model", "model": "Lorem ipsum dolor....", "manufacturer": "Lorem ipsum dorem ....", "year": 198454, "category": "Lorem ipsum dolor ..." } ] }' 3) Verify that the cluster is working correctly, even after restarting both nodes with snapshots. (Testing the issue)
|
| Comments |
| Comment by Robert Varga [ 11/May/23 ] |
|
Note that having only two members configured is not something we support – RAFT requires an odd number of members to prevent split-brain problems. |