[BGPCEP-918] BGP session crash with unknown BGP Link State TLV Created: 24/Jul/20  Updated: 17/Sep/20

Status: Open
Project: bgpcep
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: High
Reporter: Olivier Dugeon Assignee: Ajay Lele
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File bgpls-error-karaf.log     File bgpls-error.pcapng    

 Description   

When using Cisco router IOS-XR 6.5.x or 7.0.x, as BGP Link State speaker, BGP session crash when decoding unknown Link State TLV.

First, LinkAttributesParser warn about valid link attribute (in fact about attribute it is not able to decode). This produce a BGPDocumentedException in bgp-parser. After that, BGP session crashes and ODL attempt to re-open the BGP session. However, the BGP session has not been properly close i.e. the Cisco Router doesn't see that the BGP session close. Thus, in turn to the new BGP Open message, the Router close it with a CEASE=6 reason (which is normal as the router feel that the peer try to re-open an on-going BGP session). After that, ODL enter in a loop where it try to open the BGP session, and the Router to close it with the CEASE=6 reason.

2020-07-15T19:31:47,720 | WARN  | epollEventLoopGroup-7-1 | LinkAttributesParser             | 215 - org.opendaylight.bgpcep.bgp-linkst
ate - 0.13.1 | TLV 1173 is not a valid link attribute, ignoring it                                                                      
2020-07-15T19:31:47,989 | ERROR | epollEventLoopGroup-7-1 | BGPDocumentedException           | 223 - org.opendaylight.bgpcep.bgp-parser
-api - 0.13.1 | Error = MALFORMED_ATTR_LIST

[...]

2020-07-15T19:31:58,217 | INFO  | epollEventLoopGroup-7-3 | BGPSessionImpl                   | 229 - org.opendaylight.bgpcep.bgp-rib-im
pl - 0.13.1 | Closing session: BGPSessionImpl

Unknown macro: {channel=[id}

                                                                                                                                  
2020-07-15T19:31:58,217 | INFO  | epollEventLoopGroup-7-3 | BGPPeer                          | 229 - org.opendaylight.bgpcep.bgp-rib-im
pl - 0.13.1 | Session with peer 10.194.77.240 terminated: BGPTerminationReason{error=CEASE}                                             
2020-07-15T19:31:58,218 | INFO  | epollEventLoopGroup-7-3 | BGPPeer                          | 229 - org.opendaylight.bgpcep.bgp-rib-im
pl - 0.13.1 | Closing session with peer

The only way to stop this behavior is to stop ODL or delete BGP peer.

Here, the unknown TLV is a valid one i.e. MSD TLV

The expected behavior is to raise only a warning message without the BGPDocumentedException

The expected 



 Comments   
Comment by Robert Varga [ 04/Sep/20 ]

odd22 can you capture a BGP trace of the conversation?

Comment by Olivier Dugeon [ 17/Sep/20 ]

Robert,

I just uploaded a whireshark capture and karaf.log as requested.

Test has been conducted against a Cisco IOS-XR 6.5.3 (running as VM) advertising through BGP-LS an IS-IS topology with Traffic Engineering, IPv6, Segment Routing and Extended Metrics. ODL uses the IP address 10.194.77.10 and Cisco router 10.194.77.241.

In karaf.log, search for "BGPDocumentedException" for the beginning of the problem, where you find that some TLVs are not known and thus handle, but raising an exception instead of just ignoring them. Then, the problem occurs because the exception close the TCP channel without sending a BGP Close message. So, each time ODL attempt to re-establish the BGP session, the Cisco send back a BGP Notification message with Error code Cease (6).

Don't hesitate to contact me for more explanation.

Regards

Olivier

Generated at Wed Feb 07 19:14:29 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.