Uploaded image for project: 'controller'
  1. controller
  2. CONTROLLER-1667

Failure in singleton isolation longevity test

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Confirmed
    • Resolution: Unresolved
    • Affects Version/s: unspecified
    • Fix Version/s: None
    • Component/s: clustering
    • Labels:
      None
    • Environment:

      Operating System: All
      Platform: All

    • External issue ID:
      8420

      Description

      The recent run [0] has failed, this time it really looks like a bug in ODL.

      One scenario iteration looks like this [1]. But the fourth iteration has failed [2] as the rejoining member-1 was not reporting the value from the new singleton instance (member-2).

      Looking at karaf.log of member-1 [3] I see the sequence below.
      My current hypothesis is that when isolated leader rejoins and learns the new state of entity ownership, the corresponding data tree change notification is (lost and) not re-generated, so the rejoining member does not know it should close its singleton instance.
      (It should probably close that already after detecting its isolated status, but that is another bug, not tested by the suite.)

      2017-05-10 19:07:36,747 | WARN | lt-dispatcher-20 | ShardDataTree | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | member-1-shard-entity-ownership-operational: Current transaction member-1-entity-ownership-internal-fe-0-txn-37-0 has timed out after 15000 ms in state COMMIT_PENDING
      2017-05-10 19:07:36,747 | WARN | lt-dispatcher-20 | ShardDataTree | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | member-1-shard-entity-ownership-operational: Transaction member-1-entity-ownership-internal-fe-0-txn-37-0 is still committing, cannot abort
      2017-05-10 19:07:37,689 | INFO | lt-dispatcher-22 | EntityOwnershipShard | 192 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.SNAPSHOT | member-1-shard-entity-ownership-operational (IsolatedLeader): Term 21 in "AppendEntriesReply [term=21, success=false, followerId=member-3-shard-entity-ownership-operational, logLastIndex=60, logLastTerm=21, forceInstallSnapshot=false, payloadVersion=5, raftVersion=3]" message is greater than leader's term 20 - switching to Follower
      2017-05-10 19:07:37,689 | INFO | lt-dispatcher-22 | EntityOwnershipShard | 192 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.SNAPSHOT | member-1-shard-entity-ownership-operational (IsolatedLeader) :- Switching from behavior IsolatedLeader to Follower, election term: 21
      2017-05-10 19:07:37,690 | INFO | ult-dispatcher-6 | RoleChangeNotifier | 192 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.SNAPSHOT | RoleChangeNotifier for member-1-shard-entity-ownership-operational , received role change from IsolatedLeader to Follower
      2017-05-10 19:07:37,690 | INFO | ult-dispatcher-2 | ShardManager | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | shard-manager-operational: Received LeaderStateChanged message: LeaderStateChanged [memberId=member-1-shard-entity-ownership-operational, leaderId=null, leaderPayloadVersion=-1]
      2017-05-10 19:07:37,690 | INFO | ult-dispatcher-2 | ShardManager | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | shard-manager-operational: Received role changed for member-1-shard-entity-ownership-operational from IsolatedLeader to Follower
      2017-05-10 19:07:37,846 | INFO | lt-dispatcher-17 | Shard | 192 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.SNAPSHOT | member-1-shard-inventory-config (Candidate): Cannot append entries because sender's term 6 is less than 20
      2017-05-10 19:07:37,876 | INFO | ult-dispatcher-2 | ShardManager | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | shard-manager-operational: Received LeaderStateChanged message: LeaderStateChanged [memberId=member-1-shard-entity-ownership-operational, leaderId=member-3-shard-entity-ownership-operational, leaderPayloadVersion=5]
      2017-05-10 19:07:37,880 | INFO | ult-dispatcher-7 | ShardManager | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | shard-manager-operational Received follower initial sync status for member-1-shard-entity-ownership-operational status sync done false
      2017-05-10 19:07:37,880 | INFO | lt-dispatcher-17 | EntityOwnershipShard | 192 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.SNAPSHOT | member-1-shard-entity-ownership-operational (Follower): Removing entries from log starting at 55
      2017-05-10 19:07:37,882 | INFO | lt-dispatcher-17 | ShardManager | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | shard-manager-operational Received follower initial sync status for member-1-shard-entity-ownership-operational status sync done true

      [0] https://jenkins.opendaylight.org/releng/job/controller-csit-3node-cs-partnheal-longevity-only-carbon/4/
      [1] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-cs-partnheal-longevity-only-carbon/4/archives/log.html.gz#s1-t1-k3-k1-k1-k1-k1-k1-k1-k1-k1-k1
      [2] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-cs-partnheal-longevity-only-carbon/4/archives/log.html.gz#s1-t1-k3-k1-k1-k1-k1-k1-k1-k2-k1-k1-k5-k3-k1-k2
      [3] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-cs-partnheal-longevity-only-carbon/4/archives/odl1_karaf.log.gz

        Attachments

          Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

            Activity

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              vrpolak Vratko Polak
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated: