[BGPCEP-413] Closing session after a large RIB is sent kills BGPCEP Created: 18/Mar/16  Updated: 03/Mar/19  Resolved: 02/May/16

Status: Resolved
Project: bgpcep
Component/s: BGP
Affects Version/s: Bugzilla Migration
Fix Version/s: Bugzilla Migration

Type: Bug
Reporter: Jozef Behran Assignee: Milos Fabian
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issue Links:
Duplicate
is duplicated by BGPCEP-422 org.opendaylight.protocol.bgp.rib.imp... Resolved
External issue ID: 5558

 Description   

If a peer sends a large bunch of BGP updates (either a big RIB sync or a big change to its RIB after the sync was completed) and closes immediately after finishing this transmission, the BGPCEP feature dies. All further connections are accepted and everything seems to work normally but the data received from the peers is silently discarded (well, silently relative to the connected peers; the discarding is actually pretty loud in karaf.log). The larger the bunch of BGP updates, the more probable is that this situation occurs.

Examining karaf.log gives that after the connection to the peer with the big update closes, one or more OptimisticLockFailed exceptions appear ("Node was deleted by other transaction"). After them all other peer's data is rejected with either "CanCommit Failed: Transaction chain failed" or "New transaction ABC raced with transaction XYZ" (where "XYZ" is one of the transactions that are involved in the previous OptimisticLockFailed exceptions) errors.

This is just a quick bug report intended for myself to make sure the bug does not get lost in the course of debugging various aspects of the Internet Feed tests (and to get a nice ID to refer to the bug later). More info (logs etc.) will be reported/attached as it is acquired.



 Comments   
Comment by Jozef Behran [ 18/Mar/16 ]

Similar bug encountered in the past: CONTROLLER-998. That bug was closed as "invalid" because it turned out that the problem was because BGP was using multiple writers (which is not valid). In the case of this bug it appears BGP is still "using multiple writers" in the scenario described (one writer writes data into RIB, the other one deletes the same data because the peer it came from was found dead).

Comment by Milos Fabian [ 05/Apr/16 ]

stable/beryllium: https://git.opendaylight.org/gerrit/#/c/37141/

Comment by Robert Varga [ 13/Apr/16 ]

master: https://git.opendaylight.org/gerrit/37535

Generated at Wed Feb 07 19:12:59 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.