Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Medium
Fix Version/s: None
Affects Version/s: Sodium SR4
Component/s: clustering
Labels:
- pt

The two-node cluster fails to start when the leader has stored snapshot data and the follower is started as a clean instance without snapshot data. The leader node starts correctly, but the follower node encounters a repeating issue, which is displayed as:

2023-04-14T13:40:29,326 | WARN  | Thread-35        | AbstractShardBackendResolver     | 219 - org.opendaylight.controller.sal-distributed-datastore - 1.10.4 | Failed to resolve shard
java.util.concurrent.TimeoutException: Connection attempt failed
    at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractShardBackendResolver.wrap(AbstractShardBackendResolver.java:151) ~[219:org.opendaylight.controller.sal-distributed-datastore:1.10.4]
    at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractShardBackendResolver.onConnectResponse(AbstractShardBackendResolver.java:168) ~[219:org.opendaylight.controller.sal-distributed-datastore:1.10.4]
    at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractShardBackendResolver.lambda$connectShard$4(AbstractShardBackendResolver.java:161) ~[219:org.opendaylight.controller.sal-distributed-datastore:1.10.4]
    at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) [?:?]
    at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) [?:?]
    at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478) [?:?]
    at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: org.opendaylight.controller.cluster.access.concepts.RetiredGenerationException: Originating generation 0 was superseded by 2

This issue has been observed in the Sodium-SR4 release and needs to be verified if it is present in the current master.

To build the environment where I observed this issue, I used the integration-distribution repository with Java 11.
https://github.com/opendaylight/integration-distribution/tree/release/sodium-sr4

Steps to reproduce the issue are as follows:

(Prepare environment)
1) Start the two-node cluster and verify that it is working correctly. The initial configuration is appended in the attachments.
`feature:install odl-netconf-clustered-topology odl-restconf-nb-rfc8040 odl-clustering-test-app`
2) Add data to create some snapshots in ODL. My snapshot size is around 500MB.

    curl --request POST 'http://192.168.56.101:8181/rests/data/car:cars' \
    --header 'Authorization: Basic YWRtaW46YWRtaW4=' \
    --header 'Content-Type: application/json' \
    --data '{
                "car-entry": [
                    {
                        "id": "id-'"$id"'-model",
                        "model": "Lorem ipsum dolor....",
                        "manufacturer": "Lorem ipsum dorem ....",
                        "year": 198454,
                        "category": "Lorem ipsum dolor ..."
                    }
                ]
            }'

3) Verify that the cluster is working correctly, even after restarting both nodes with snapshots.

(Testing the issue)
1) Replace the ODL folder in the follower node with a clean ODL distribution.
2) Start the ODL leader, and then start the follower node with the required Karaf features installed.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

karafFollowerMember-2.log
1.17 MB
17/Apr/23 12:04 PM
karafLeaderMember-1.log
238 kB
17/Apr/23 12:06 PM
member-1-akka.conf
1 kB
17/Apr/23 12:02 PM
member-2-akka.conf
1 kB
17/Apr/23 12:03 PM
modules.conf
0.9 kB
17/Apr/23 12:01 PM
module-shards.conf
1 kB
17/Apr/23 12:01 PM

Assignee:: Ivan Hrasko

Reporter:: Peter Suna

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 17/Apr/23 12:01 PM

Updated:: 22/Jan/24 1:46 PM

Details

Description

Attachments

Attachments

Activity

People

Dates