[NETVIRT-509] Dissociates l3vpn from router and then Associates with network has 100% packet loss Created: 03/Mar/17 Updated: 03/May/18 Resolved: 14/Dec/17 |
|
| Status: | Resolved |
| Project: | netvirt |
| Component/s: | General |
| Affects Version/s: | Boron |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Medium |
| Reporter: | Suvitha Balu | Assignee: | Aswin Suryanarayanan |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| External issue ID: | 7893 |
| Description |
|
Steps: Attached the ODL Sandbox log which has ODL and OVS dump. Issue observed on both carbon and boron. |
| Comments |
| Comment by Suvitha Balu [ 03/Mar/17 ] |
| Comment by Vivekanandan Narasimhan [ 03/Mar/17 ] |
|
Hi Hanumant, This issue may be clearly related to the fix for Can you do a check in sandbox flows/fibEntries and let me know? Vivek |
| Comment by Hanamantagoud V Kandagal [ 03/Mar/17 ] |
|
Please find our analysis : Ping 20.1.1.10/32(VM_IP_NET20) from 10.1.1.9/32(VM_IP_NET10) is being done.
table0 (in_port=2 actions=write_metadata:0x30000000000) ==> table41 (doesn't match any entry here) table=42, n_packets=122, n_bytes=11728, priority=61010,ip,metadata=0x30000000000/0xfffff0000000000 actions=learn(table=252,idle_timeout=300,priority=61010,delete_learned,cookie=0x6900000,eth_type=0x800,NXM_OF_IP_SRC[]=NXM_OF_IP_DST[],NXM_OF_IP_DST[]=NXM_OF_IP_SRC[],NXM_OF_IP_PROTO[],load:0x1->NXM_NX_REG5[0..7]),resubmit(,17) ==> table17 ==> table19 . On table19 it must have matched one of the below 2 entries , but it didn't . Packet count shows 0. cookie=0x8000009, duration=26.309s, table=19, n_packets=0, n_bytes=0, priority=20,metadata=0x222f2/0xfffffffe,dl_dst=fe:16:3e:34:de:4a actions=goto_table:21 cookie=0x8000009, duration=26.119s, table=19, n_packets=0, n_bytes=0, priority=20,metadata=0x222f2/0xfffffffe,dl_dst=fe:16:3e:8a:e2:a9 actions=goto_table:21 We suspect packet destMAC is tampered with , hence its not able to match the dl_dst in table=19. |
| Comment by Vivekanandan Narasimhan [ 16/Mar/17 ] |
|
Hi Hanumant, Thanks for the analysis. It looks like with the steps put by submitter here, the VM would be completely unaware that it went out of a router-based-vpn and re-entered into a network-based-vpn. The VM might be holding the old MAC ARP resolved for the gateway-ip-address of 10.1.1.1. And so it might have used the same old gateway-mac-address incorrectly to send the IP Packets. Hi Suvitha, Can we please check with wireshark if the VM attempts ARPing after it sees IP Packet losses and through that ARPing it gets the new mac-address now applied for 10.1.1.1 which is fe:xx:xx:xx:xx rather than the router-interface mac-address. Vivek |
| Comment by Jamo Luhrsen [ 20/Mar/17 ] |
|
(In reply to Vivekanandan Narasimhan from comment #4) right now, we don't have a way to do pcaps in these openstack instances, https://git.opendaylight.org/gerrit/#/c/45441/ This is the second time in maybe 6 months that we've wanted this, which JamO |
| Comment by Jamo Luhrsen [ 20/Mar/17 ] |
|
(In reply to Jamo Luhrsen from comment #5) I meant, Suvitha, not Hanumant |
| Comment by Hari Krishna [ 21/Mar/17 ] |
|
Trying to reproduce this in local setup. Went through the pipeline, everything looked ok. |
| Comment by Suvitha Balu [ 21/Mar/17 ] |
|
Sure Jamo, i can explore on this. |
| Comment by Vivekanandan Narasimhan [ 03/Apr/17 ] |
|
Hi Hari, Do we have any updates on this? Vivek |
| Comment by Hari Krishna [ 25/Apr/17 ] |
|
ETA - 5th May 2017 |
| Comment by Hari Krishna [ 01/May/17 ] |
|
Hi Suvitha, I tried to reproduce this bug locally, and wasn't able to. PLease could you try it again and let me know if you still this issue. Regards |
| Comment by Suvitha Balu [ 03/May/17 ] |
|
Log from Sandbox: |
| Comment by Hari Krishna [ 03/May/17 ] |
|
Hi Suvitha, Thank you for running it again. The difference between the csit and my setup is that i don't configure ACL's. I ran the test without ACL's enabled. This seems to be the root cause. I need to investigate this further. I am putting the ETA as 10th May 2017 as i need to investigate this further. There is nothing to be done from L3VPN side. Regards |
| Comment by Hari Krishna [ 05/May/17 ] |
|
HI Suvitha, I have tried to recreate this manually. with manual steps it is working. I have tried this multiple times and don’t see an issue. Please can you add some delay of like 2-3 sec after router disassociation and also introduce a delay of 2-3 seconds after adding networks to L3VPN and before you ping. Can you try this an let me know. Regards |
| Comment by Hari Krishna [ 08/May/17 ] |
|
HI Suvitha, Regards |
| Comment by Jamo Luhrsen [ 08/May/17 ] |
|
(In reply to Hari Krishna from comment #15) Hari, how long of a delay do you suggest? I don't think this problem is I think the idea of increasing ping count to 20 for debugging is best, Suvitha, any chance you know of a failure for this bug in an releng job? the Thanks, |
| Comment by Hari Krishna [ 09/May/17 ] |
|
Hi Jamo/Suvitha, In our local setup what I have seen is after disassociate from router and associating to networks. The ACL learn tables namely I am following up with the ACL team separately to find out, if the above two observations are valid and what is the expectation from the ACL learn tables. Regards |
| Comment by Hari Krishna [ 09/May/17 ] |
|
Hi Som the steps to reproduce this issue. #Create Neutron networks I have anyway sent you the log files, I am assigning this bug to you to have a look. Regards Hi Som In our local setup what I have seen is after disassociate from router and associating to networks. The ACL learn tables namely Yes this testing is done one latest nitrogen. Regards |
| Comment by Somashekar Byrappa [ 09/May/17 ] |
|
Hi Slava, As this issue is related to ACL in Learn mode, I am assigning this one to you. Thanks, |
| Comment by Aswin Suryanarayanan [ 14/Dec/17 ] |
|
Learn mode is deprecated. |