[NETVIRT-1080] PNF failed after FIP detached Created: 06/Feb/18  Updated: 15/May/18  Resolved: 15/May/18

Status: Resolved
Project: netvirt
Component/s: General
Affects Version/s: Nitrogen, Oxygen
Fix Version/s: Oxygen-SR2, Fluorine

Type: Bug Priority: Medium
Reporter: Hari Prasidh Assignee: Vinh Nguyen
Resolution: Done Votes: 0
Labels: csit:failures
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

ODL:Nitrogen-SR1 3 nodes
OpenStack: Pike 3 nodes (1 controller, 2 compute)


Attachments: Zip Archive OVS_info.zip     Zip Archive SandboxJobs.zip     File log_full.html.gz    

 Description   

The reproduction steps .
1.External NW creation
2.Internal NW creation
3.Router creation and GW/IF setting
4.VM creation
5.SNAT confirmation
OK
6.FIP attach
7.DNAT confirmation
OK
8.FIP detach
9.SNAT confirmation
NG



 Comments   
Comment by Vinh Nguyen [ 26/Feb/18 ]

Some investigations as below:

1) Could not reproduce the issue using Nitrogen SR2. Please re-test with the SR2 or later

2) The attached OVS_info.zip doesn't seem to relate to the setup in log_full.html.gz

3) In  the OVS_info.zip setup, the external gateway doesn't seem to be up-and-running.

    The external network group entry for all three nodes drop external traffic:

      cookie=0x8000004, duration=7288.519s, table=21, n_packets=1910, n_bytes=187180, priority=10,ip,metadata=0x324b2/0xfffffe actions=group:225000

      group_id=225000,type=all,bucket=actions=drop

This indicates the external network group installer failed to detect mac address of the  external gateway.

 

4) In the scenario that captured in log_full.html.gz, I don't see the above issue in 3).

   Could we retry the scenario and send me the latest log?

Comment by Hari Prasidh [ 27/Feb/18 ]

If you look on below failed test scenario from the log.html. You can find above issue.
Ping External Network PNF from Vm Instance 1 (non-critical).

And recent CSIT jobs are not having the test scenarios what I've mentioned in issue reproduction steps.

Comment by Hari Prasidh [ 01/Mar/18 ]

Issue can be reproducible in sandbox jobs .

https://logs.opendaylight.org/sandbox/vex-yul-odl-jenkins-2/netvirt-csit-1node-openstack-pike-upstream-stateful-snat-conntrack-nitrogen/15/robot-plugin/log_full.html.gz

https://logs.opendaylight.org/sandbox/vex-yul-odl-jenkins-2/netvirt-csit-1node-openstack-pike-upstream-stateful-snat-conntrack-nitrogen/17/robot-plugin/log_full.html.gz

please find the attachments for logs SandboxJobs.zip

Comment by Sam Hague [ 08/Mar/18 ]

This is also hitting oxygen: https://jenkins.opendaylight.org/releng/me/my-views/view/netvirt-oxygen-queens/job/netvirt-csit-1node-openstack-queens-upstream-stateful-oxygen/207/

Comment by Vinh Nguyen [ 10/Mar/18 ]

This issue is not related to the PNF/SNAT issue in recent CSIT. The issue is found in sandbox where extra tests are added to the end of the current external-network test cases. The additional test cases are:

 

  • Delete the FIP for VM instance1: PASS
  • SNAT TCP connection to External Gateway From VM Instance1 : PASS
  • SNAT UDP connection to External Gateway From VM Instance1 : PASS
  • Ping External Network PNF from Vm Instance 1: FAIL

Here, PNF ping fails when the FIP is deleted. We would expect PNF scenario continues to work via SNAT

 

Comment by Vinh Nguyen [ 12/Mar/18 ]

Update title to 'PNF failed after FIP Detached'
Reason: Based on the attched CSIT report, the SNAT TCP/UDP connection verification passed,
the failure was in PNF verification after FIP detached

Investigation:

Three nodes, control, compute1, compute2

1.External NW creation
2.Internal NW creation
3.Router creation and GW/IF setting
4.VMs creation: VM1 on compute1 node, VM2 on compute2 ndoe

The PNF SubnetRoute flow entries are installed for ALL 3 nodes

cookie=0x8000003, duration=339.441s, table=21, n_packets=0, n_bytes=0, priority=34,ip,metadata=0x30d42/0xfffffe,nw_dst=10.10.10.0/24 actions=write_metadata:0x138c030d42/0xfffffffffe,goto_table:22
cookie=0x8000004, duration=339.441s, table=22, n_packets=0, n_bytes=0, priority=42,ip,metadata=0x30d42/0xfffffe,nw_dst=10.10.10.255 actions=drop
cookie=0x8000004, duration=902.422s, table=22, n_packets=0, n_bytes=0, priority=0 actions=CONTROLLER:65535

5.SNAT confirmation
6.FIPs attach
7.DNAT confirmation
8.FIPs for VMs detach
9.VM pings external PNF instances: FAILED

Problem:

The PNF subnetRoute flow entries are removed on the OVS node that hosted the VM after detaching the FIP (in step 8).
Hence traffic from VM on that OVS node to the PNF instance is no longer possible.

The PNF subnetRoute flow entries are removed on the dpn when the FIP port is the last port for the VPN on that dpn:

https://github.com/opendaylight/netvirt/blob/master/vpnmanager/impl/src/main/java/org/opendaylight/netvirt/vpnmanager/VpnFootprintService.java#L352

The VpnToDpnList for the external subnet after FIP is attached (step 6):
{
"vpn-id": 100012,
"vpn-instance-name": "ddf97de4-0a2d-48a8-b7d3-af8ffdae6761",
"vpn-state": "created",
"vpn-to-dpn-list": [
{
"dpn-state": "active",
"dpnId": 8796751999625,
"ip-addresses": [

{ "ip-address": "192.168.56.18/32", "ip-address-source": "ExternalFixedIP" }

,

{ "ip-address": "192.168.56.13/32", "ip-address-source": "FloatingIP" }

]
},
{
"dpn-state": "active",
"dpnId": 8796748560798,
"ip-addresses": [

{ "ip-address": "192.168.56.17/32", "ip-address-source": "FloatingIP" }

]
}
],
"vrf-id": "ddf97de4-0a2d-48a8-b7d3-af8ffdae6761"
}

Notes:

  • Two compute nodes, dpnid: 8796751999625, 8796748560798
  • Ports on 8796751999625:
    + 192.168.56.18: router external GW interface
    + 192.168.56.13: FIP for VM1
  • Ports on 8796748560798:
    + 192.168.56.17: FIP for VM2

The VpnToDpnList for the external subnet after FIP is deleted (step 8):

{
"vpn-id": 100012,
"vpn-instance-name": "ddf97de4-0a2d-48a8-b7d3-af8ffdae6761",
"vpn-state": "created",
"vpn-to-dpn-list": [
{
"dpn-state": "active",
"dpnId": 8796751999625,
"ip-addresses": [

{ "ip-address": "192.168.56.18/32", "ip-address-source": "ExternalFixedIP" }

]
},

{ "dpn-state": "inactive", "dpnId": 8796748560798 }

],
"vrf-id": "ddf97de4-0a2d-48a8-b7d3-af8ffdae6761"
}

After detaching the FIP on 8796748560798 the vpn-to-dpn-list for external subnet vpn is empty,
fibManager.cleanUpDpnForVpn is called to clean up the PNF flow entries.

https://github.com/opendaylight/netvirt/blob/master/vpnmanager/impl/src/main/java/org/opendaylight/netvirt/vpnmanager/VpnFootprintService.java#L354

Suggested solution:

  • Method FibManager.cleanUpDpnForVpn cleans up flow entries associating with the one VPN such as
    SubnetRoute, BroadCast,etc. For internal VPN, these flow entries are created for internal VPN
    when at least one VPN interfaces exists on the VPN and should be removed when the last VPN
    interface are removed.
  • For external subnet VPN, the flow entries mentioned above are created when the subnet is created.
  • Therefore, when deleting last VPN interface on external subnet VPN, simply remove VpnToDpnList
    associated with the VPN. The cleanup DPN for external subnet VPN will be done when the external
    subnet is deleted.
Comment by Vinh Nguyen [ 12/Mar/18 ]

Patch: 

https://git.opendaylight.org/gerrit/#/c/69102/

 

CSIT verification pending.

Comment by Jamo Luhrsen [ 13/Mar/18 ]

reading the commit message in the patch makes sense for why we
could lose connectivity (flow removed), but I think the connectivity does eventually return. What is making that
happen?

commit message:

Problem:
Deleting last FIP port on dpn also deleting the PNF flow
entries ion the OVS node.
Solution:
Don't invoke fibManager.cleanUpDpnForVpn (which removes
the PNF flows) when last port on external subnet vpn is
deleted on the dpn.

Comment by Vinh Nguyen [ 29/Mar/18 ]

sorry the previous analysis is incorrect.

Revised analysis:

I found that the problem ocurs only when:

  • NAT conntract mode is used.
  • The deleted FIP VM is on the NAPT switch.

The vpn-to-dpn-list for the external subnet on each compute dpn contains single IP address -
the FIP. The external router GW interface is contained in the vpn-to-dpn-list on the control node.

If the FIP on the NAPT switch is deleted, since it is the last address on the
vpn-to-dpn-list. fibManager.cleanUpDpnForVpn is invoked and the PNF flows will be
removed from the dpn as a result.

NAPT controller mode doesn't have this issue because the external router GW interface IP is
contained in the vpn-to-dpn-list of the NAPT switch. Thus deleting the last FIP will not
invoke fibManager.cleanUpDpnForVpn since the router GW interface IP still exists in the dpn.

The following is the vpn-instance-op-data-entry for the external subnet when FIPs added:

Conntrack mode:
From the log for SandboxJobs/job1, the dpnid of nodes are:

  • control: 223071002466895, 189fdbb1-eab1-4108-9b2a-bff343503552: external router gw interface
  • compute1: 73535277218113 - NAPT switch, FIP: 10.10.10.13
  • compute2: 116882536471118 - non NAPT switch, FIP: 10.10.10.4

 

{
   "vpn-id": 100001,
   "vpn-instance-name": "aa5d4ad7-8f17-4efe-b0cf-bec0ccc2c086",
   "vpn-state": "created",
   "vpn-to-dpn-list": [
     {
        "dpn-state": "active",
        "dpnId": 223071002466895,
        "vpn-interfaces": [
          
{             "interface-name": "189fdbb1-eab1-4108-9b2a-bff343503552"           }
      ]
    },
    {
       "dpn-state": "active",
       "dpnId": 73535277218113,
       "ip-addresses": [
          
{             "ip-address": "10.10.10.13/32",             "ip-address-source": "FloatingIP"           }
       ]
     },
    {
       "dpn-state": "active",
       "dpnId": 116882536471118,
       "ip-addresses": [
          
{             "ip-address": "10.10.10.4/32",             "ip-address-source": "FloatingIP"           }
       ]
     }
  ],
   "vrf-id": "aa5d4ad7-8f17-4efe-b0cf-bec0ccc2c086"
},
The following is the after FIP for VM1 is deleted:
{
   "vpn-id": 100001,
   "vpn-instance-name": "aa5d4ad7-8f17-4efe-b0cf-bec0ccc2c086",
   "vpn-state": "created",
   "vpn-to-dpn-list": [
     {
       "dpn-state": "active",
       "dpnId": 223071002466895,
       "vpn-interfaces": [
          
{            "interface-name": "189fdbb1-eab1-4108-9b2a-bff343503552"          }
      ]
     },
    
{       "dpn-state": "inactive",       "dpnId": 73535277218113     }
,
     {
       "dpn-state": "active",
       "dpnId": 116882536471118,
       "ip-addresses": [
        
{           "ip-address": "10.10.10.4/32",           "ip-address-source": "FloatingIP"         }
      ]
    }
  ],
   "vrf-id": "aa5d4ad7-8f17-4efe-b0cf-bec0ccc2c086"
 },
For NAPT controller mode:
 FIPs added:
 {
   "vpn-id": 100012,
   "vpn-instance-name": "ddf97de4-0a2d-48a8-b7d3-af8ffdae6761",
   "vpn-state": "created",
   "vpn-to-dpn-list": [
     {
       "dpn-state": "active",
       "dpnId": 8796751999625,
       "ip-addresses": [
        
{           "ip-address": "192.168.56.18/32",           "ip-address-source": "ExternalFixedIP"         }
,
        
{          "ip-address": "192.168.56.13/32",          "ip-address-source": "FloatingIP"        }
     ]
   },
   {
       "dpn-state": "active",
        "dpnId": 8796748560798,
        "ip-addresses": [
          
{            "ip-address": "192.168.56.17/32",            "ip-address-source": "FloatingIP" }
         ]
       }
   ],
   "vrf-id": "ddf97de4-0a2d-48a8-b7d3-af8ffdae6761"
 }
FIP on the NAPT switch (8796751999625) deleted:
 {
   "vpn-id": 100012,
   "vpn-instance-name": "ddf97de4-0a2d-48a8-b7d3-af8ffdae6761",
   "vpn-state": "created",
   "vpn-to-dpn-list": [
     {
       "dpn-state": "active",
       "dpnId": 8796751999625,
       "ip-addresses": [
        
{           "ip-address": "192.168.56.18/32",           "ip-address-source": "ExternalFixedIP"         }
      ]
     },
     {
       "dpn-state": "active",
       "dpnId": 8796748560798,
       "ip-addresses": [
        
{           "ip-address": "192.168.56.17/32",           "ip-address-source": "FloatingIP" }
      ]
     }
   ],
   "vrf-id": "ddf97de4-0a2d-48a8-b7d3-af8ffdae6761"

}

Generated at Wed Feb 07 20:23:11 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.