[NETVIRT-515] CSIT Sporadic failures - tempest.scenario.test_network_advanced_server_ops.TestNetworkAdvancedServerOps.test_server_connectivity_reboot Created: 07/Mar/17 Updated: 15/Dec/17 Resolved: 27/Sep/17 |
|
| Status: | Resolved |
| Project: | netvirt |
| Component/s: | General |
| Affects Version/s: | Carbon |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Jamo Luhrsen | Assignee: | Chetan Arakere Gowdru |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| Attachments: |
|
| External issue ID: | 7919 |
| Description |
|
|
| Comments |
| Comment by Jamo Luhrsen [ 08/Mar/17 ] |
|
(In reply to Jamo Luhrsen from comment #0) |
| Comment by Chetan Arakere Gowdru [ 04/Apr/17 ] |
|
Attachment test_server_connectivity_reboot.txt has been added with description: logs specific to TC - test_server_connectivity_reboot |
| Comment by Chetan Arakere Gowdru [ 04/Apr/17 ] |
|
Hi Koby/Vivek, As pointed out earlier, it looks like as part of FIP VM response is getting hit by (26->46) as a result of which the packet where punted to controller and lost.(I can see packets continuously punted to controller for the full 2 min duration of retry) Ex:(flows from my local system): 1) cookie=0x8000004, duration=6693.738s, table=26, n_packets=0, n_bytes=0, priority=10,ip,metadata=0x30d42/0xfffffe,nw_src=10.0.0.5 actions=set_field:172.160.0.100->ip_src,write_metadata:0x30d48/0xfffffe,goto_table:28 If such packet where required to be punted controller, then the possibility is missing rule-1(internal-ip to external-ip DNAT translation rule) From the NAT logs, I can clearly see request to program rule-1 is been sent, but I do observe openflow error during is TC run(I’m not sure if these are related to rule-1 programming) NAT logs Openflow Error Unfortunately in these temptest TC, Teardown logs were taken post clean-up, there are not much useful to debug further on this issue. With the current inputs, I’m not able to further triage on this issue. Any inputs apricated. PFA(Logs specific this TC run) Thanks, |
| Comment by Chetan Arakere Gowdru [ 11/Apr/17 ] |
|
Hi Chetan, I know you're still into debugging tempest tests failures. It seems that most issues around tempest tests are resolved, and I have a feeling the rest of the bug fixes has contributed to that (fixes for table=21 rules, DHCP, etc). However, I can still see "total meldowns" here and there (Jamo has reported a few), like this report: Just wanted to update you on my analysis so far just in case you'll run into debugging those (we're entering a holiday this week, so won't be able to continue). The tests are basically failing on the inability to connect to VM FIP. My feeling is that one of relevant ARP responder rules is missing, and therefore when the connection to the VM is attempted ARP requests are constantly sent and punted to the controller. I guess adding more logs around the installation of ARP responder rules for router interfaces might help here, or maybe even adding temporary INFO prints for ARP requests punted to the controller until we resolve those tempest issues. Figured out this might save you some time in case you're looking into those failures this week. Have a nice week, |
| Comment by Chetan Arakere Gowdru [ 11/Apr/17 ] |
|
Hi Koby/All, In order further investigate on this issue, I have added info logs as suggested. 1) Added info logs for Arp Responder Install flow(to print flow-name which will have the FIP/VM router interface ip ex: "Arp:tbl_81:lport_2:gw_192.168.225.5") I hope these info logs are good enough to identify if there are ARP responder rules got missed resulting in this failure. https://git.opendaylight.org/gerrit/#/c/54593/ Thanks, |
| Comment by Jamo Luhrsen [ 25/Apr/17 ] |
|
tempest failures appear to mostly have been resolved. We can re-open any |
| Comment by Jamo Luhrsen [ 05/May/17 ] |
| Comment by Jamo Luhrsen [ 03/Aug/17 ] |
| Comment by Jamo Luhrsen [ 08/Aug/17 ] |
| Comment by Jamo Luhrsen [ 27/Sep/17 ] |
|
no longer seen in CSIT |