[BGPCEP-491] bgp-ls update not being shown in the bgp topology in ODL Created: 06/Jul/16 Updated: 03/Mar/19 Resolved: 10/Aug/16 |
|
| Status: | Resolved |
| Project: | bgpcep |
| Component/s: | BGP |
| Affects Version/s: | Bugzilla Migration |
| Fix Version/s: | Bugzilla Migration |
| Type: | Bug | ||
| Reporter: | Claudio David Gasparini | Assignee: | Claudio David Gasparini |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| External issue ID: | 6160 | ||||||||||||||||
| Description |
|
bgp-ls update not being shown in the bgp topology in ODL |
| Comments |
| Comment by Claudio David Gasparini [ 06/Jul/16 ] |
|
Attachment karaf.log has been added with description: karaf.log |
| Comment by Claudio David Gasparini [ 06/Jul/16 ] |
|
Attachment karaf.log.1.gz has been added with description: karaf.log1 |
| Comment by Claudio David Gasparini [ 06/Jul/16 ] |
|
Attachment karaf.log.2.gz has been added with description: karaf.log2 |
| Comment by Claudio David Gasparini [ 06/Jul/16 ] |
|
Attachment karaf.log.3.gz has been added with description: karaf.log3 |
| Comment by Claudio David Gasparini [ 06/Jul/16 ] |
|
Attachment karaf.log.4.gz has been added with description: karaf.log.4 |
| Comment by Claudio David Gasparini [ 06/Jul/16 ] |
|
Attachment karaf.log.5.gz has been added with description: karaf.log5 |
| Comment by Claudio David Gasparini [ 06/Jul/16 ] |
|
Attachment karaf.log.6.gz has been added with description: karaf.log6 |
| Comment by Al Goddard [ 06/Jul/16 ] |
|
This issue occurs when a link is shut and goes down. It is removed from show mpls traffic-eng topology output, but not from the ODL topology. After link was shut(no shut) successive GETS on bgpls toplolgy (curl -u admin:admin http://localhost:8181/restconf/operational/network-topology:network-topology/topology/example-linkstate-topology) do not reflect change link state, i.e., the link-id object is always present. sh bgp li li on bgp speaker has 256,232 entries. Summary of steps to reproduce this issue: 1. GET bgp-ls topology from ODL no difference in ODL topology 4. bring link up no differences in ODL topology Throttle timer or distribution is set: distribute bgp-ls instance 2 throttle 10 |
| Comment by Claudio David Gasparini [ 06/Jul/16 ] |
|
Hi Al, Could you please attach package capture from both UPDATES. (Advertisement and Withdrawal) so I can reproduce it locally. Thanks |
| Comment by Al Goddard [ 06/Jul/16 ] |
|
Not following. What are you referring to with 'package capture'? |
| Comment by Al Goddard [ 06/Jul/16 ] |
|
Adding additional info. bgp link-state on router shows no route, yet ODL still holds the link-id. RP/0/RP0/CPU0:HUCRS1#sh bgp li li [E][O][I0x2][N[c65060][b10.0.0.208][a0.0.0.0$ ODL topology: [acg000@automation2 ~]$ grep link-id.*10.0.7.9[78] bgpls.core4 | grep area=0 |
| Comment by Al Goddard [ 06/Jul/16 ] |
|
Continued to perform GET on bgp-ls tolology, and link-id is now removed from topology. The time difference is ~50 minutes. I expect topology change in ODL to be instantaneous.
link-id only present in bgpls.core4 file. [acg000@automation2 ~]$ grep link-id.*10.0.7.9[78] bgpls.core[45] | grep area=0 |
| Comment by Claudio David Gasparini [ 06/Jul/16 ] |
|
"package capture" = using wireshark capture the package that are sent and attach the .cap. |
| Comment by Al Goddard [ 06/Jul/16 ] |
|
After no shut link advertisement not shown in ODL. I ran tshark for a few minutes (em1.pcap enclosed), for your review. I will GET topology every 5 minutes to see when the topology reflects the no shut link. |
| Comment by Al Goddard [ 06/Jul/16 ] |
|
Enclosed is the pcap file from our ODL server which was run for several minutes during link advertisement. |
| Comment by Al Goddard [ 06/Jul/16 ] |
|
Attachment em1.pcap.gz has been added with description: pcap from ODL server for link advertisement |
| Comment by Claudio David Gasparini [ 08/Jul/16 ] |
|
Hi Al, what I'm expecting to find in .cap is the message announcing the link and the one announcing the withdrawn, so based on what you said you ran wireshark after fail. Therefore I can do really much about it. What I suspect can be issue, is BUG-6083. When a new bet Path is announced, is not updated. Once the fix for this bug is merged I will let you know and ask you to test against the latest version of Be which I ll provide to you. Meanwhile you can attach the new .cap and add more information about the message that contains the update so we can try to replicate or identify from the capture. |
| Comment by Al Goddard [ 08/Jul/16 ] |
|
The pcap I sent was when the link was advertised, not after. I ran for a few minutes thinking that the advertisement would be captured. |
| Comment by Claudio David Gasparini [ 11/Jul/16 ] |
|
Hi Al, the withdrawal is not advertised, we expect to see an MP_UNREACH_NLRI So it might be that router is not sending the notification. |
| Comment by Al Goddard [ 11/Jul/16 ] |
|
Cisco already confirmed that updates were sent from the router. I thought the purpose of sending you the pcap file was for you to confirm that the updates were being handled properly by ODL? The distribute configuration has been configured since last year. |
| Comment by Milos Fabian [ 11/Jul/16 ] |
|
(In reply to Al Goddard from comment #18) Hi Al, |
| Comment by Al Goddard [ 15/Jul/16 ] |
|
I see that the filter: bgp.mp_reach_nlri_ipv4_prefix does not return any matches, but this filter: bgp.update.path_attribute.mp_reach_nlri.afi does return matches. Are you specifically looking not for an UPDATE message, but a bgp.mp_reach* message? |
| Comment by Milos Fabian [ 16/Jul/16 ] |
|
(In reply to Al Goddard from comment #20) We are looking for specific MP_UNREACH NLRI Path Attribute (included in Update Message) which express the link is no longer reachable, so can be removed from ODL BGP-LS topology. |
| Comment by Al Goddard [ 17/Jul/16 ] |
|
Please see Comment 16. There are hundreds of type 'bgp.update.path_attribute.mp_reach_nlri.afi', as this test was done when when the link came up, and hence an mp_reach would be generated. |
| Comment by Claudio David Gasparini [ 18/Jul/16 ] |
|
Hi Al, seems that you didn't receive the email I sent last week. I ll attach it here: Hi, Some things that needs to be checked :
This will ensure that issue is in ODL side. Then provide the .cap to us for further analysis, attaching it to the bug. Regards, |
| Comment by Al Goddard [ 22/Jul/16 ] |
|
Cisco BGP DE are investigating router debugs + wireshark of a recent repro of this issue. I am attachment of the relevant bgp.update.path_attribute.mp_unreach_nlri.afi packets for this test when the link was shut down, and a bgp withdrawal update was sent. The pcap clearly shows the packet was sent by the router and received on the controller. Why wasn't this link removed from the topology? |
| Comment by Al Goddard [ 22/Jul/16 ] |
|
These are the packets that contain MP_UNREACH updates for the link that was withdrawn. Summary extract from capture: Path Attribute - MP_UNREACH_NLRI |
| Comment by Al Goddard [ 22/Jul/16 ] |
|
Attachment unreach.txt has been added with description: MP_UNREACH capture from wireshark |
| Comment by Al Goddard [ 25/Jul/16 ] |
|
Cisco BGP DE has confirmed that withdraw update was sent by using a debug SMU for bgp and correlating with wireshark capture. This is an urgent issue pointing at ODL. Please continue to pursue to resolution. ------- We were able to turn on the actual date/time from the View menu. The withdrawal packet was sent out by CRS router within 2sec of the link being shut down. Wireshark also indicates the same. LC/0/13/CPU0:Jul 20 14:04:27.574 : ifmgr[197]: %PKT_INFRA-LINK-5-CHANGED : Interface TenGigE0/13/0/0, changed state to Administratively Down And the withdraws were packed into the packets at 14:04:29 Jul 20 14:04:29.387 default-bgp/spkr-tr2-gen 0/RP0/CPU0 t20 [GEN]:7664: CSCva25043 bgp_send_update(7664) withdraw to be formatted for pelem 10db8084 So investigation now shifts to ODL team. Please continue to communicate with them via 6160. |
| Comment by Claudio David Gasparini [ 25/Jul/16 ] |
|
Hi Al, thanks for the update. Please attach the .cap. Regards, |
| Comment by Al Goddard [ 25/Jul/16 ] |
|
Attached is the .pcap file of the MP_UNREACH NRLI for the route withdrawal that is not shown in the bgpl-ls datastore topology. |
| Comment by Al Goddard [ 25/Jul/16 ] |
|
Attachment bgpls.pcap.gz has been added with description: .pcap file showing route withdrawal |
| Comment by Al Goddard [ 25/Jul/16 ] |
|
Although reproduced on Be SR2, we are using Brocade bvc 3.0 in production (problem also observed on this release). I don't have any karaf logs for this issue. Shal we continue to work this via this bug, or defer to Brocade for handling escalation? |
| Comment by Ajay L [ 25/Jul/16 ] |
|
Hi Al, Once the link down event happens, can you check the LOC-RIB to see if link information is still present there? Also can you please enable bgpcep debugs and collect controller logs a minute before and a couple of minutes after the link is shut? Getting pcaps during those few minutes will also help as we can then correlate update messages to how the controller handles them log:set DEBUG org.opendaylight.bgpcep Thanks |
| Comment by Ajay L [ 26/Jul/16 ] |
|
Hi Al, Once the link down event happens, can you check the LOC-RIB to see if link information is still present there? Also can you please enable bgpcep debugs and collect controller logs a minute before and a couple of minutes after the link is shut? Getting pcaps during those few minutes will also help as we can then correlate update messages to how the controller handles them log:set DEBUG org.opendaylight.bgpcep Thanks Getting "show mpls traffic-eng topology" and "show bgp link-state link-state" (with appropriate filters if number of entries in excessive) from the router before & after link down while collecting above output will help |
| Comment by Claudio David Gasparini [ 26/Jul/16 ] |
|
Hi Al, I was able to announce and withdrawal the link using the same packets you provided. bgp.update.path_attribute.mp_reach_nlri.afi && bgp.ls.nlri_ipv4_interface_address == 10.0.7.98 && bgp.ls.nlri_ipv4_neighbor_address == 10.0.7.97 unreach from the second bgp.update.path_attribute.mp_unreach_nlri.afi && bgp.ls.nlri_ipv4_interface_address == 10.0.7.98 && bgp.ls.nlri_ipv4_neighbor_address == 10.0.7.97 The behavior was the expected one and everything worked fine. Link was announced and removed correctly when it was withdrawn. Regards, |
| Comment by Al Goddard [ 26/Jul/16 ] |
|
Just repro the issues today and captured ODL DEBUG/wireshark. Will update bug with detailed information soon. The topology is quite large. Local rib after json formatting is 500M. This clearly looks like topology not being updated. Here’s a timeline: RP/0/RP0/CPU0:Jul 26 09:24:34.451 : ifmgr[250]: %PKT_INFRA-LINK-3-UPDOWN : Interface Bundle-Ether218, changed state to Down Pre capture:
Post capture:
Link in pre/post topologies: [acg000@automation2 ~]$ grep link-id bgpls.odl1-1[23] | grep 10.0.7.9[78] bgpls.odl1-13: "link-id": "bgpls://Ospf:2/type=link&local-as=65060&local-domain=167772368&local-area=167898919&local-router=167772369&remote-as=65060&remote-domain=167772368&remote-area=167898919&remote-router=167772368&ipv4-iface=10.0.7.97&ipv4-neigh=10.0.7.98", Link only in pre local rib: [acg000@automation2 ~]$ grep ipv4-inter.*address bgpls.odl1-1[23]-rib | grep 10.0.7.9[78] |
| Comment by Lynn Rivera [ 26/Jul/16 ] |
|
Ajay, Here are the karaf logs. As Al mentioned, the loc rib is updated but the topology is not. As I previously mentioned, there are many ping pong messages. The pcap did not work since I used wrong interface during the test but I don't think you need that since we are seeing loc rib updated so we know we are getting the update. Let me know if you need us to rerun this test. Lynn |
| Comment by Lynn Rivera [ 26/Jul/16 ] |
|
Attachment bug6160.zip has been added with description: karaf log during link down |
| Comment by Claudio David Gasparini [ 26/Jul/16 ] |
|
Hi Lynn, based on the logs and test I did, this is not an issue with the packets or how their are handled. This is an error that occurs before and therefore affects topology update. Logs shows IllegalStateException: New transaction PingPongTransaction Could you confirm that this logs ERRORs are already seen before links goes down, and some trace if possible when they start to be present? Regards, |
| Comment by Ajay L [ 26/Jul/16 ] |
|
(In reply to Lynn Rivera from comment #34) Yep there are a large number of PingPongTransaction error messages. Also given the observation that on link down loc-rib is updated but not topology, this is most likely same issue as in https://bugs.opendaylight.org/show_bug.cgi?id=6111 |
| Comment by Lynn Rivera [ 26/Jul/16 ] |
|
Yes, I am confirming we have the PING PONG before and after the link up/down. Lynn |
| Comment by Ajay L [ 10/Aug/16 ] |
|
Fix for BUG-6111 and BUG-6342 has fixed this issue |