Uploaded image for project: 'controller'
  1. controller
  2. CONTROLLER-1667

Failure in singleton isolation longevity test

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • None
    • None
    • clustering
    • None
    • Operating System: All
      Platform: All

    • 8420

      The recent run [0] has failed, this time it really looks like a bug in ODL.

      One scenario iteration looks like this [1]. But the fourth iteration has failed [2] as the rejoining member-1 was not reporting the value from the new singleton instance (member-2).

      Looking at karaf.log of member-1 [3] I see the sequence below.
      My current hypothesis is that when isolated leader rejoins and learns the new state of entity ownership, the corresponding data tree change notification is (lost and) not re-generated, so the rejoining member does not know it should close its singleton instance.
      (It should probably close that already after detecting its isolated status, but that is another bug, not tested by the suite.)

      2017-05-10 19:07:36,747 | WARN | lt-dispatcher-20 | ShardDataTree | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | member-1-shard-entity-ownership-operational: Current transaction member-1-entity-ownership-internal-fe-0-txn-37-0 has timed out after 15000 ms in state COMMIT_PENDING
      2017-05-10 19:07:36,747 | WARN | lt-dispatcher-20 | ShardDataTree | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | member-1-shard-entity-ownership-operational: Transaction member-1-entity-ownership-internal-fe-0-txn-37-0 is still committing, cannot abort
      2017-05-10 19:07:37,689 | INFO | lt-dispatcher-22 | EntityOwnershipShard | 192 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.SNAPSHOT | member-1-shard-entity-ownership-operational (IsolatedLeader): Term 21 in "AppendEntriesReply [term=21, success=false, followerId=member-3-shard-entity-ownership-operational, logLastIndex=60, logLastTerm=21, forceInstallSnapshot=false, payloadVersion=5, raftVersion=3]" message is greater than leader's term 20 - switching to Follower
      2017-05-10 19:07:37,689 | INFO | lt-dispatcher-22 | EntityOwnershipShard | 192 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.SNAPSHOT | member-1-shard-entity-ownership-operational (IsolatedLeader) :- Switching from behavior IsolatedLeader to Follower, election term: 21
      2017-05-10 19:07:37,690 | INFO | ult-dispatcher-6 | RoleChangeNotifier | 192 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.SNAPSHOT | RoleChangeNotifier for member-1-shard-entity-ownership-operational , received role change from IsolatedLeader to Follower
      2017-05-10 19:07:37,690 | INFO | ult-dispatcher-2 | ShardManager | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | shard-manager-operational: Received LeaderStateChanged message: LeaderStateChanged [memberId=member-1-shard-entity-ownership-operational, leaderId=null, leaderPayloadVersion=-1]
      2017-05-10 19:07:37,690 | INFO | ult-dispatcher-2 | ShardManager | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | shard-manager-operational: Received role changed for member-1-shard-entity-ownership-operational from IsolatedLeader to Follower
      2017-05-10 19:07:37,846 | INFO | lt-dispatcher-17 | Shard | 192 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.SNAPSHOT | member-1-shard-inventory-config (Candidate): Cannot append entries because sender's term 6 is less than 20
      2017-05-10 19:07:37,876 | INFO | ult-dispatcher-2 | ShardManager | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | shard-manager-operational: Received LeaderStateChanged message: LeaderStateChanged [memberId=member-1-shard-entity-ownership-operational, leaderId=member-3-shard-entity-ownership-operational, leaderPayloadVersion=5]
      2017-05-10 19:07:37,880 | INFO | ult-dispatcher-7 | ShardManager | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | shard-manager-operational Received follower initial sync status for member-1-shard-entity-ownership-operational status sync done false
      2017-05-10 19:07:37,880 | INFO | lt-dispatcher-17 | EntityOwnershipShard | 192 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.SNAPSHOT | member-1-shard-entity-ownership-operational (Follower): Removing entries from log starting at 55
      2017-05-10 19:07:37,882 | INFO | lt-dispatcher-17 | ShardManager | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | shard-manager-operational Received follower initial sync status for member-1-shard-entity-ownership-operational status sync done true

      [0] https://jenkins.opendaylight.org/releng/job/controller-csit-3node-cs-partnheal-longevity-only-carbon/4/
      [1] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-cs-partnheal-longevity-only-carbon/4/archives/log.html.gz#s1-t1-k3-k1-k1-k1-k1-k1-k1-k1-k1-k1
      [2] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-cs-partnheal-longevity-only-carbon/4/archives/log.html.gz#s1-t1-k3-k1-k1-k1-k1-k1-k1-k2-k1-k1-k5-k3-k1-k2
      [3] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-cs-partnheal-longevity-only-carbon/4/archives/odl1_karaf.log.gz

            Unassigned Unassigned
            vrpolak Vratko Polak
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: