-
Bug
-
Resolution: Unresolved
-
Medium
-
None
-
Sodium SR4
The two-node cluster fails to start when the leader has stored snapshot data and the follower is started as a clean instance without snapshot data. The leader node starts correctly, but the follower node encounters a repeating issue, which is displayed as:
2023-04-14T13:40:29,326 | WARN | Thread-35 | AbstractShardBackendResolver | 219 - org.opendaylight.controller.sal-distributed-datastore - 1.10.4 | Failed to resolve shard java.util.concurrent.TimeoutException: Connection attempt failed at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractShardBackendResolver.wrap(AbstractShardBackendResolver.java:151) ~[219:org.opendaylight.controller.sal-distributed-datastore:1.10.4] at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractShardBackendResolver.onConnectResponse(AbstractShardBackendResolver.java:168) ~[219:org.opendaylight.controller.sal-distributed-datastore:1.10.4] at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractShardBackendResolver.lambda$connectShard$4(AbstractShardBackendResolver.java:161) ~[219:org.opendaylight.controller.sal-distributed-datastore:1.10.4] at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) [?:?] at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) [?:?] at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478) [?:?] at java.lang.Thread.run(Thread.java:829) [?:?] Caused by: org.opendaylight.controller.cluster.access.concepts.RetiredGenerationException: Originating generation 0 was superseded by 2
This issue has been observed in the Sodium-SR4 release and needs to be verified if it is present in the current master.
To build the environment where I observed this issue, I used the integration-distribution repository with Java 11.
https://github.com/opendaylight/integration-distribution/tree/release/sodium-sr4
Steps to reproduce the issue are as follows:
(Prepare environment)
1) Start the two-node cluster and verify that it is working correctly. The initial configuration is appended in the attachments.
`feature:install odl-netconf-clustered-topology odl-restconf-nb-rfc8040 odl-clustering-test-app`
2) Add data to create some snapshots in ODL. My snapshot size is around 500MB.
curl --request POST 'http://192.168.56.101:8181/rests/data/car:cars' \ --header 'Authorization: Basic YWRtaW46YWRtaW4=' \ --header 'Content-Type: application/json' \ --data '{ "car-entry": [ { "id": "id-'"$id"'-model", "model": "Lorem ipsum dolor....", "manufacturer": "Lorem ipsum dorem ....", "year": 198454, "category": "Lorem ipsum dolor ..." } ] }'
3) Verify that the cluster is working correctly, even after restarting both nodes with snapshots.
(Testing the issue)
1) Replace the ODL folder in the follower node with a clean ODL distribution.
2) Start the ODL leader, and then start the follower node with the required Karaf features installed.