[NETVIRT-537] CSIT Sporadic failures - TestNetworkAdvancedServerOps.test_server_connectivity_pause_unpause Created: 15/Mar/17  Updated: 06/Apr/18  Resolved: 06/Apr/18

Status: Resolved
Project: netvirt
Component/s: General
Affects Version/s: Carbon
Fix Version/s: None

Type: Bug Priority: Highest
Reporter: Jamo Luhrsen Assignee: Chetan Arakere Gowdru
Resolution: Cannot Reproduce Votes: 0
Labels: csit:failures, csit:snat-conntrack, csit:sporadic
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 7995

 Description   

https://logs.opendaylight.org/releng/jenkins092/netvirt-csit-1node-openstack-newton-nodl-v2-upstream-transparent-carbon/413/archives/log.html.gz#s1-s2-s1-t10



 Comments   
Comment by Chetan Arakere Gowdru [ 11/Apr/17 ]

Hi Chetan,

I know you're still into debugging tempest tests failures. It seems that most issues around tempest tests are resolved, and I have a feeling the rest of the bug fixes has contributed to that (fixes for table=21 rules, DHCP, etc).

However, I can still see "total meldowns" here and there (Jamo has reported a few), like this report:
https://logs.opendaylight.org/releng/jenkins092/netvirt-csit-1node-openstack-newton-nodl-v2-upstream-learn-carbon/209/archives/log.html

Just wanted to update you on my analysis so far just in case you'll run into debugging those (we're entering a holiday this week, so won't be able to continue).

The tests are basically failing on the inability to connect to VM FIP.
During the times the test tries to connect to the VM, I’m constantly seeing the ARP counters in the ODL increasing:
2017-04-09 17:36:40,781 | INFO | CountersThread | CountersDumperThread | 294 - org.opendaylight.infrautils.counters-impl - 1.1.0.SNAPSHOT | auc.arp_req_rcv_notification: +4, auc.arp_req_rcv: +4
2017-04-09 17:36:41,781 | INFO | CountersThread | CountersDumperThread | 294 - org.opendaylight.infrautils.counters-impl - 1.1.0.SNAPSHOT | auc.arp_req_rcv_notification: +3, auc.arp_req_rcv: +3
2017-04-09 17:36:42,781 | INFO | CountersThread | CountersDumperThread | 294 - org.opendaylight.infrautils.counters-impl - 1.1.0.SNAPSHOT | auc.arp_req_rcv_notification: +3, auc.arp_req_rcv: +3

My feeling is that one of relevant ARP responder rules is missing, and therefore when the connection to the VM is attempted ARP requests are constantly sent and punted to the controller.
It's impossible to tell from the counters whether those ARP requests are directed to the FIP address OR to the VM router interface, but my guess is those are directed to the VM router interface (because there are prints on the FIP ARP responder rules being installed).

I guess adding more logs around the installation of ARP responder rules for router interfaces might help here, or maybe even adding temporary INFO prints for ARP requests punted to the controller until we resolve those tempest issues.

Figured out this might save you some time in case you're looking into those failures this week.

Have a nice week,
Koby

Comment by Chetan Arakere Gowdru [ 11/Apr/17 ]

Hi Koby/All,

In order further investigate on this issue, I have added info logs as suggested.

1) Added info logs for Arp Responder Install flow(to print flow-name which will have the FIP/VM router interface ip ex: "Arp:tbl_81:lport_2:gw_192.168.225.5")
2) Change log from trace to info when ARP requests are punted to controller.

I hope these info logs are good enough to identify if there are ARP responder rules got missed resulting in this failure.

https://git.opendaylight.org/gerrit/#/c/54593/

Thanks,
Chetan

Comment by Jamo Luhrsen [ 25/Apr/17 ]

tempest failures appear to mostly have been resolved. We can re-open any
of the tempest bugs if/when they appear again in CSIT.

Comment by Jamo Luhrsen [ 08/Jun/17 ]

https://logs.opendaylight.org/releng/jenkins092/netvirt-csit-1node-openstack-newton-nodl-v2-upstream-stateful-carbon/657/log.html.gz#s1-s2-s1

Comment by Jamo Luhrsen [ 20/Mar/18 ]

seen in our snat-conntrack job with Oxygen:

https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-1node-openstack-queens-upstream-stateful-snat-conntrack-oxygen/214/

Generated at Wed Feb 07 20:21:49 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.