Uploaded image for project: 'controller'
  1. controller
  2. CONTROLLER-1706

Large transaction traffic prevents leader to be moved

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • None
    • None
    • clustering
    • None
    • Operating System: All
      Platform: All

    • 8606

      This symptom is affecting current cluster testing, but after fixing other bugs this might no longer be critical. This has been only seen on Sandbox so far.

      The robot symptom "Leader not found" [0] is similar to CONTROLLER-1693 but this time there is no "UnreachableMember" seen in karaf.log [1] (perhaps because CONTROLLER-1703).
      CONTROLLER-1675 is similar, but here the leader moving call (remove-shard-replica in this case) does not fail, and karaf.log just states the leadership transfer was not successful:

      2017-06-06 07:51:36,160 | WARN | lt-dispatcher-28 | aftActorLeadershipTransferCohort | 193 - org.opendaylight.controller.sal-akka-raft - 1.5.1.SNAPSHOT | member-1-shard-default-config: Failed to transfer leadership in 10.01 s

      I suspect the large transaction is the creation of a large list at the start of write-transactions, see CONTROLLER-1703. But I am not sure whether the following proves or disproves that:

      2017-06-06 07:52:40,474 | INFO | ternal.Finalizer | lientBackedTransaction$Finalizer | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.1.SNAPSHOT | Aborted orphan transaction ClientSnapshot

      {identifier=member-1-datastore-config-fe-0-txn-7-0}

      [0] https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-only-carbon/4/log.html.gz#s1-s18-t1-k2-k12-k1-k3-k1
      [1] https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-only-carbon/4/odl1_karaf.log.gz

            jmorvay@cisco.com Jakub Morvay
            vrpolak Vratko Polak
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: