Uploaded image for project: 'bgpcep'
  1. bgpcep
  2. BGPCEP-601

B and C: rejoining isolated node causes exabgp unable to connect to any cluster node

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Bugzilla Migration
    • Bugzilla Migration
    • BGP
    • None
    • Operating System: All
      Platform: All

    • 7222

      The problem observed in https://jenkins.opendaylight.org/releng/view/bgpcep/job/bgpcep-csit-3node-periodic-bgpclustering-ha-only-boron/86 and https://jenkins.opendaylight.org/releng/view/bgpcep/job/bgpcep-csit-3node-periodic-bgpclustering-ha-only-carbon/82/

      The suite with karaf restart works well, but the suite when the bgp leader is isolated does not work well.
      https://logs.opendaylight.org/releng/jenkins092/bgpcep-csit-3node-periodic-bgpclustering-ha-only-boron/86/archives/
      Robot logs shows that ExaGbp is connected well to node3 (10.29.13.32).
      Connected to peer neighbor 10.29.13.32 local-ip 10.29.13.14 local-as 64496 peer-as 64496 router-id 10.29.13.14 family-allowed in-open (out)
      After the node isolation it reconnected to node2(10.29.12.215), but connection was broken just after.
      Connected to peer neighbor 10.29.12.215 local-ip 10.29.13.14 local-as 64496 peer-as 64496 router-id 10.29.13.14 family-allowed in-open (out)
      Peer 10.29.12.215 ASN 64496 out loop, peer reset, message [notification received (6,0)] error[Cease / Unspecific]

      Tue, 22 Nov 2016 03:44:18 | INFO | 22747 | reactor | New peer setup: neighbor 10.29.13.32 local-ip 10.29.13.14 local-as 64496 peer-as 64496 router-id 10.29.13.14 family-allowed in-open
      Tue, 22 Nov 2016 03:44:18 | INFO | 22747 | reactor | New peer setup: neighbor 10.29.12.30 local-ip 10.29.13.14 local-as 64496 peer-as 64496 router-id 10.29.13.14 family-allowed in-open
      Tue, 22 Nov 2016 03:44:18 | INFO | 22747 | reactor | New peer setup: neighbor 10.29.12.215 local-ip 10.29.13.14 local-as 64496 peer-as 64496 router-id 10.29.13.14 family-allowed in-open
      Tue, 22 Nov 2016 03:44:18 | WARNING | 22747 | configuration | Loaded new configuration successfully
      Tue, 22 Nov 2016 03:44:21 | INFO | 22747 | network | Peer 10.29.12.30 ASN 64496 out loop, peer reset, message [notification received (6,5)] error[Cease / Connection Rejected]
      Tue, 22 Nov 2016 03:44:21 | INFO | 22747 | network | Peer 10.29.12.215 ASN 64496 out loop, peer reset, message [notification received (6,5)] error[Cease / Connection Rejected]
      Tue, 22 Nov 2016 03:44:21 | INFO | 22747 | network | Connected to peer neighbor 10.29.13.32 local-ip 10.29.13.14 local-as 64496 peer-as 64496 router-id 10.29.13.14 family-allowed in-open (out)
      Tue, 22 Nov 2016 03:44:25 | INFO | 22747 | network | Peer 10.29.12.30 ASN 64496 out loop, peer reset, message [notification received (6,5)] error[Cease / Connection Rejected]
      Tue, 22 Nov 2016 03:44:26 | INFO | 22747 | network | Peer 10.29.12.215 ASN 64496 out loop, peer reset, message [notification received (6,5)] error[Cease / Connection Rejected]
      Tue, 22 Nov 2016 03:44:30 | INFO | 22747 | network | Peer 10.29.12.30 ASN 64496 out loop, peer reset, message [notification received (6,5)] error[Cease / Connection Rejected]
      Tue, 22 Nov 2016 03:44:31 | INFO | 22747 | network | Peer 10.29.12.215 ASN 64496 out loop, peer reset, message [notification received (6,5)] error[Cease / Connection Rejected]
      Tue, 22 Nov 2016 03:44:35 | INFO | 22747 | network | Peer 10.29.12.30 ASN 64496 out loop, peer reset, message [notification received (6,5)] error[Cease / Connection Rejected]
      Tue, 22 Nov 2016 03:44:36 | INFO | 22747 | network | Peer 10.29.12.215 ASN 64496 out loop, peer reset, message [notification received (6,5)] error[Cease / Connection Rejected]
      Tue, 22 Nov 2016 03:44:41 | INFO | 22747 | network | Peer 10.29.12.30 ASN 64496 out loop, peer reset, message [notification received (6,5)] error[Cease / Connection Rejected]
      Tue, 22 Nov 2016 03:44:42 | INFO | 22747 | network | Connected to peer neighbor 10.29.12.215 local-ip 10.29.13.14 local-as 64496 peer-as 64496 router-id 10.29.13.14 family-allowed in-open (out)
      Tue, 22 Nov 2016 03:44:43 | INFO | 22747 | network | Peer 10.29.12.215 ASN 64496 out loop, peer reset, message [notification received (6,0)] error[Cease / Unspecific]
      Tue, 22 Nov 2016 03:44:43 | INFO | 22747 | network | Peer 10.29.13.32 ASN 64496 out loop, peer reset, message [notification received (6,0)] error[Cease / Unspecific]

      In the logs of node 2 we can find
      2016-11-22 03:44:42,074 | INFO | entLoopGroup-8-6 | AbstractBGPSessionNegotiator | 264 - org.opendaylight.bgpcep.bgp-rib-impl - 0.6.2.SNAPSHOT | BGP Session with peer [id: 0x059dff3e, L:/10.29.12.215:1790 - R:/10.29.13.14:42429] established successfully.
      ...
      2016-11-22 03:44:43,120 | WARN | ult-dispatcher-2 | ConcurrentDOMDataBroker | 172 - org.opendaylight.controller.sal-distributed-datastore - 1.4.2.SNAPSHOT | Tx: DOM-CHAIN-12-1 Error during phase CAN_COMMIT, starting Abort
      OptimisticLockFailedException

      {message=Optimistic lock failed., errorList=[RpcError [message=Optimistic lock failed., severity=ERROR, errorType=APPLICATION, tag=resource-denied, applicationTag=null, info=null, cause=org.opendaylight.yangtools.yang.data.api.schema.tree.ConflictingModificationAppliedException: Node was deleted by other transaction.]]}

      So it looks like the peer configuration was deleted and should not have been done.

      This bug may be similar to https://bugs.opendaylight.org/show_bug.cgi?id=6788, but logs dont show the same problem.

            cdgasparini Claudio David Gasparini
            pgubka@cisco.com Peter Gubka
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: