Uploaded image for project: 'bgpcep'
  1. bgpcep
  2. BGPCEP-872

Controller does not send KEEPALIVE while learning prefixes

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: High High
    • None
    • None
    • None
    • None

      The regression is detected here:

      https://jenkins.opendaylight.org/releng/job/bgpcep-csit-1node-bgp-ingest-mixed-all-neon/

      BGP scale test uses play.py script to setup BGP session and inject 500K prefixes to controller:

      python play.py --amount 500000 --myip=10.30.171.84 --myport=17900 --peerip=10.30.170.47 --peerport=1790 --insert=10 --withdraw=9 --prefill 10 --update single --info --results bgp.csv &> play.py.out
      

      After ~3 minutes (hold timer) the BGP script throws this ERROR:

      2019-05-06 10:08:44,090 INFO BGP-Dummy-1: Iteration: 296000 - total remaining prefixes: 203991
      2019-05-06 10:08:44,726 ERROR BGP-Dummy-1: Peer has overstepped the hold timer.
      Unhandled exception in thread started by <function job at 0x7fefba965938>
      Traceback (most recent call last):
        File "play.py", line 2066, in job
          state.perform_one_loop_iteration()
        File "play.py", line 1958, in perform_one_loop_iteration
          self.timer.check_peer_hold_time(self.timer.snapshot_time)
        File "play.py", line 1429, in check_peer_hold_time
          raise RuntimeError("Peer has overstepped the hold timer.")
      RuntimeError: Peer has overstepped the hold timer.
      Traceback (most recent call last):
        File "play.py", line 2168, in <module>
          threaded_job(arguments)
        File "play.py", line 2162, in threaded_job
          rpcserver.serve_forever()
        File "/usr/lib/python2.7/SocketServer.py", line 231, in serve_forever
          poll_interval)
        File "/usr/lib/python2.7/SocketServer.py", line 150, in _eintr_retry
          return func(*args)
      KeyboardInterrupt
      

      And the session is disconnected:

      2019-05-06T10:08:46,632 | INFO  | epollEventLoopGroup-10-1 | BGPSessionImpl                   | 242 - org.opendaylight.bgpcep.bgp-rib-impl - 0.11.1.SNAPSHOT | End of input detected. Close the session.
      2019-05-06T10:08:46,633 | INFO  | epollEventLoopGroup-10-1 | BGPPeer                          | 242 - org.opendaylight.bgpcep.bgp-rib-impl - 0.11.1.SNAPSHOT | Session with peer 10.30.171.99 went down
      2019-05-06T10:08:46,633 | INFO  | epollEventLoopGroup-10-1 | BGPPeer                          | 242 - org.opendaylight.bgpcep.bgp-rib-impl - 0.11.1.SNAPSHOT | Closing session with peer
      2019-05-06T10:08:46,650 | INFO  | epollEventLoopGroup-10-1 | AbstractPeer                     | 242 - org.opendaylight.bgpcep.bgp-rib-impl - 0.11.1.SNAPSHOT | Closed per Peer /(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)bgp-rib/rib/rib[{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)id=example-bgp-rib}]/peer/peer[{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)peer-id=bgp://10.30.171.99}] removed
      2019-05-06T10:08:46,653 | INFO  | epollEventLoopGroup-10-1 | AbstractPeer                     | 242 - org.opendaylight.bgpcep.bgp-rib-impl - 0.11.1.SNAPSHOT | Closing peer chain Uri{_value=bgp://10.30.171.99}
      2019-05-06T10:08:46,659 | INFO  | epollEventLoopGroup-10-1 | BGPSessionImpl                   | 242 - org.opendaylight.bgpcep.bgp-rib-impl - 0.11.1.SNAPSHOT | Closing session: BGPSessionImpl{channel=[id: 0x79806baf, L:/10.30.171.133:1790 ! R:/10.30.171.99:17900], state=UP}
      

      According to the play.py logs controller does not send any KEEPALIVE message while it is learning the prefixes, this is main reason the test fails. See attached test tool logs, when it works controller sends 1 KEEPALIVE every ~1 min.

        1. bgp-tool-600k.txt
          65 kB
        2. bgp-tool-nok.txt
          35 kB
        3. bgp-tool-ok.txt
          58 kB

            Unassigned Unassigned
            ecelgp Luis Gomez
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: