[NETVIRT-528] CSIT Sporadic failures - floating IP unreachable Created: 13/Mar/17  Updated: 04/Apr/17  Resolved: 04/Apr/17

Status: Resolved
Project: netvirt
Component/s: General
Affects Version/s: Carbon
Fix Version/s: None

Type: Bug
Reporter: Jamo Luhrsen Assignee: Alon Kochba
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issue Links:
Duplicate
duplicates NETVIRT-568 CSIT Sporadic failures - missing tabl... Resolved
External issue ID: 7968

 Description   

two instances are assigned a floating ip and connectivity is tested with a
ping from the control node. In this CSIT failure, one of the floating
ips is not reachable. The other is responding.

possibly digging in to the flow dumps will point to something wrong?
I checked that the floating ip addresses for specific flows looked
the same for the working and non-working floating ips, but that's as
far as I went.

https://logs.opendaylight.org/releng/jenkins092/netvirt-csit-1node-openstack-mitaka-upstream-learn-carbon/192/archives/log.html.gz



 Comments   
Comment by Alon Kochba [ 14/Mar/17 ]

Note that v1 with neutron

Comment by Alon Kochba [ 14/Mar/17 ]

Note that networking-odl-v1 had a known issue where these failures would happen from time to time.
Koby should be able to verify from the dumps if this is the same issue.

Please update here if this also happens in v2.

Comment by Koby Aizer [ 15/Mar/17 ]

I looked into the dumps as well.

Alon is right, and there is indeed a sporadic bug in v1 with floating IPs. However, this is not the case here.

The failure in the report Jamo has attached is caused by a missing table=21 rule of the VMs private IPs. This looks really similar to NETVIRT-519 [1].

[1] https://bugs.opendaylight.org/show_bug.cgi?id=7938

Comment by Jamo Luhrsen [ 23/Mar/17 ]

https://logs.opendaylight.org/releng/jenkins092/netvirt-csit-1node-openstack-newton-upstream-stateful-carbon/242/archives/log.html.gz#s1-s1-s3-t13

Comment by Koby Aizer [ 27/Mar/17 ]

Root cause for this bug is NETVIRT-568

Comment by Sam Hague [ 03/Apr/17 ]

https://git.opendaylight.org/gerrit/#/c/53355/

Comment by Vivekanandan Narasimhan [ 04/Apr/17 ]

I am duping this bug to 8082, as both 7968 and 8082 root-caused to missing flows in Table 19 for Internal Router Interface Mac Addresses.

On my analysis with [302] below (equated to [0] from Jamo), I could see the root cause for both 7968 and sub-issue with 7939 is improper flows in L3-GW-MAC-TABLE (i.e, Table 19) making it unable to send packet down to Table 21.

In both the failed scenarios

a. 7968 - Ping from VMInstance to Floating-IP (see [301])

b. 7939 - Here referring to Add Multiple extra-route TC failure with
Jamo's log at [0] equated to [302]

Table 81 is getting correctly populated always with right MAC to respond to on ARP (and getting cleaned-up too) , but Table 19 is never having the same matching Router-internal-interface MAC-flows thereby resulting in traffic failures.

If you would be able to squeeze some time, please see why L3-GW-MAC-Table is not able to program the router-internal-intf flows and sometimes not removing old flows.

[301] https://logs.opendaylight.org/releng/jenkins092/netvirt-csit-1node-openstack-newton-upstream-stateful-carbon/242/archives/log.html.gz#s1-s1-s3

[302] https://logs.opendaylight.org/releng/jenkins092/netvirt-csit-1node-openstack-mitaka-upstream-stateful-carbon/250/archives/log.html.gz

Generated at Wed Feb 07 20:21:47 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.