Uploaded image for project: 'controller'
  1. controller
  2. CONTROLLER-1876

cluster leadership changes nodes when isolated follower is rejoined to cluster

    XMLWordPrintable

Details

    • High

    Description

      after isolating a follower node, the remaining two nodes are in a majority and
      healthy cluster with one leader and one follower. Sometimes, in our csit automated
      tests, when the isolated follower is rejoined the leader moves to a new node.

      the expectation is that the leadership should not change in this case. This is a
      sporadic failure in our controller csit jobs.

      We need to understand why this change is happening and determine if it's
      acceptable and/or expected. If so, we need to change the tests to account for
      this case. It does feel inefficient for this to happen, and it may be some
      condition we can avoid with a fix somewhere on the controller side.

      example robot log

      the karaf logs are there too:

      odl 1
      odl 2
      odl 3

      looking quickly at the jenkins robot graphs, it seems this specific test case
      might fail 4-5 times out of 30 tries. I think this test case is there in multiple
      suites though, so possible that it happens periodically there too making it
      more or frequent. either way, 4-5 times out of 30 is enough to work with for
      now.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            tpantelis Tom Pantelis
            jluhrsen Jamo Luhrsen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: