Uploaded image for project: 'controller'
  1. controller
  2. CONTROLLER-2076

Two-Node Cluster Fails to Install Snapshot on Clean Follower Node

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Medium Medium
    • None
    • Sodium SR4
    • clustering

      The two-node cluster fails to start when the leader has stored snapshot data and the follower is started as a clean instance without snapshot data. The leader node starts correctly, but the follower node encounters a repeating issue, which is displayed as:

      2023-04-14T13:40:29,326 | WARN  | Thread-35        | AbstractShardBackendResolver     | 219 - org.opendaylight.controller.sal-distributed-datastore - 1.10.4 | Failed to resolve shard
      java.util.concurrent.TimeoutException: Connection attempt failed
          at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractShardBackendResolver.wrap(AbstractShardBackendResolver.java:151) ~[219:org.opendaylight.controller.sal-distributed-datastore:1.10.4]
          at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractShardBackendResolver.onConnectResponse(AbstractShardBackendResolver.java:168) ~[219:org.opendaylight.controller.sal-distributed-datastore:1.10.4]
          at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractShardBackendResolver.lambda$connectShard$4(AbstractShardBackendResolver.java:161) ~[219:org.opendaylight.controller.sal-distributed-datastore:1.10.4]
          at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) [?:?]
          at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) [?:?]
          at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478) [?:?]
          at java.lang.Thread.run(Thread.java:829) [?:?]
      Caused by: org.opendaylight.controller.cluster.access.concepts.RetiredGenerationException: Originating generation 0 was superseded by 2 

      This issue has been observed in the Sodium-SR4 release and needs to be verified if it is present in the current master.

      To build the environment where I observed this issue, I used the integration-distribution repository with Java 11.
      https://github.com/opendaylight/integration-distribution/tree/release/sodium-sr4

      Steps to reproduce the issue are as follows:

      (Prepare environment)
      1) Start the two-node cluster and verify that it is working correctly. The initial configuration is appended in the attachments.
      `feature:install odl-netconf-clustered-topology odl-restconf-nb-rfc8040 odl-clustering-test-app`
      2) Add data to create some snapshots in ODL. My snapshot size is around 500MB.

          curl --request POST 'http://192.168.56.101:8181/rests/data/car:cars' \
          --header 'Authorization: Basic YWRtaW46YWRtaW4=' \
          --header 'Content-Type: application/json' \
          --data '{
                      "car-entry": [
                          {
                              "id": "id-'"$id"'-model",
                              "model": "Lorem ipsum dolor....",
                              "manufacturer": "Lorem ipsum dorem ....",
                              "year": 198454,
                              "category": "Lorem ipsum dolor ..."
                          }
                      ]
                  }' 

      3) Verify that the cluster is working correctly, even after restarting both nodes with snapshots.

      (Testing the issue)
      1) Replace the ODL folder in the follower node with a clean ODL distribution.
      2) Start the ODL leader, and then start the follower node with the required Karaf features installed.

       

        1. karafFollowerMember-2.log
          1.17 MB
        2. karafLeaderMember-1.log
          238 kB
        3. member-1-akka.conf
          1 kB
        4. member-2-akka.conf
          1 kB
        5. modules.conf
          0.9 kB
        6. module-shards.conf
          1 kB

            ivanhrasko Ivan Hrasko
            PeterSuna Peter Suna
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: