[NETVIRT-871] SNAT not working from Non-NAPT switch node Created: 25/Aug/17  Updated: 08/Apr/19  Resolved: 10/Oct/17

Status: Resolved
Project: netvirt
Component/s: General
Affects Version/s: Carbon
Fix Version/s: None

Type: Bug
Reporter: YOGA LAKSHMI SWETHA PAYYAVULA Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Text File Dumpflows_9053.txt     Zip Archive compute1_napt_case1.zip     Zip Archive compute1_napt_case2.zip     Zip Archive compute2_non_napt_case1.zip     Zip Archive compute2_non_napt_case2.zip     Zip Archive control_node_case1.zip     Zip Archive control_node_case2.zip     HTML File dump_flows_9035     Zip Archive karaf.log.zip    
Issue Links:
Duplicate
duplicates NETVIRT-803 With conntrack SNAT communication wit... Resolved
External issue ID: 9053

 Description   

ISSUE : The SNAT implementation in the ODL chooses a NAPT switch among the control and compute nodes. In this NAPT switch node only the table 21 flows are written and in the non-NAPT switch these flows are absent.

Due to this in the SNAT scenario when we initiate communication from the non-napt switch, it fails.

This is due to the table 47 and table 21(few flows) flows that are absent in the non-napt switch node.

This issue is seen in both the controller and conntrack mode, but the behavior is different in both these nodes.

IN Controller Mode - The SNAT scenario when we initiate communication from the non-napt switch, it fails during the first attempt.Then when we try from the napt switch node SNAT works perfectly fine,

After this step when we try to communicate from the other VM (non-napt switch node) it succeeds.

This is due to the table 47 and table 21(few flows) flows that are absent in the non-napt switch node.

Once we establish the communication from the VM in the napt switch node, the table 21 flow occurs in the non-napt switch node.

IN Conntrack Mode - From the non-napt switch SNAT fails completely , even after trying SNAT from the napt switch node VM. SNAT from the non-napt switch fails.



 Comments   
Comment by Aswin Suryanarayanan [ 28/Aug/17 ]

The non-NAPT switches will not have the table 47 entries. The packet will be forwarded to tunnel from table 26 to the NAPT switch and translation happens in NAPT switch.

Could you please provide the flow dumps in NAPT and Non NAPT switches.

Comment by YOGA LAKSHMI SWETHA PAYYAVULA [ 01/Sep/17 ]

PFA for the dump flows for napt and non-napt switches

Comment by YOGA LAKSHMI SWETHA PAYYAVULA [ 01/Sep/17 ]

Attachment dump_flows_9035 has been added with description: dump flows

Comment by YOGA LAKSHMI SWETHA PAYYAVULA [ 18/Sep/17 ]

Please ignore the previous attachment

Comment by YOGA LAKSHMI SWETHA PAYYAVULA [ 18/Sep/17 ]

Attachment Dumpflows_9053.txt has been added with description: Napt and no-napt flows

Comment by YOGA LAKSHMI SWETHA PAYYAVULA [ 18/Sep/17 ]

Steps to Reproduce the Issue in Controller Mode :

1. Create Internal Network, Subnet
2. Create 2 VM's for the internal network
3. Create External Network and subnet
4. Create router and associate the router to both the networks
5. From the Non-Napt Switch VM initiate SNAT communication

SNAT Communication FAILS

6. From the Napt Switch VM initiate the SNAT communication it succeeds
7. Repeat the Step5, now the SNAT works fine

IN Conntrack Mode - From the non-napt switch SNAT fails completely , even after trying SNAT from the napt switch node VM. SNAT from the non-napt switch fails.
The Step 5 and Step 7 both fails.

Comment by Aswin Suryanarayanan [ 19/Sep/17 ]

Could you please upload the flows in conntrack mode as well? Also where you able to locate where exactly the packet is dropped? In conntrack mode ping is supported so you can try it , which will give sufficient packet count to trace where it is dropped. I don't see packet counts for table 26 and for router mac flow in table 19 in the flow dump. Is it passing successfully through the acl tables?

Comment by balakrishnan k [ 22/Sep/17 ]

Attachment karaf.log.zip has been added with description: ODL log

Comment by balakrishnan k [ 22/Sep/17 ]

(In reply to Aswin Suryanarayanan from comment #5)
> Could you please upload the flows in conntrack mode as well? Also where you
> able to locate where exactly the packet is dropped? In conntrack mode ping
> is supported so you can try it , which will give sufficient packet count to
> trace where it is dropped. I don't see packet counts for table 26 and for
> router mac flow in table 19 in the flow dump. Is it passing successfully
> through the acl tables?

Aswin,
Attached the conntrack flows for control and compute nodes.

case1: initiate ping from Non NAPT switch at first.
compute node 1 selected as NAPT switch , when we initiate ping first time from VM instance(30.0.0.7) created in compute node2.
Not getting dropped in the compute node1, finally the packet reaching table 22 flow.
cookie=0x8000004, duration=1474.961s, table=22, n_packets=254, n_bytes=24892, priority=0 actions=CONTROLLER:65535
and ping failed.
controller log some error are displayed.

ERROR:
2017-09-22 16:03:02,767 | ERROR | pool-16-thread-1 | SubnetRoutePacketInHandler | 360 - org.opendaylight.netvirt.vpnmanager-impl - 0.4.2.SNAPSHOT | Vpn interface tune09d81ac14f doesn't exist.
2017-09-22 16:03:03,769 | ERROR | pool-16-thread-1 | SubnetRoutePacketInHandler | 360 - org.opendaylight.netvirt.vpnmanager-impl - 0.4.2.SNAPSHOT | Vpn interface tune09d81ac14f doesn't exist.
2017-09-22 16:03:04,770 | ERROR | pool-16-thread-1 | SubnetRoutePacketInHandler | 360 - org.opendaylight.netvirt.vpnmanager-impl - 0.4.2.SNAPSHOT | Vpn interface tune09d81ac14f doesn't exist.
2017-09-22 16:03:05,769 | ERROR | pool-16-thread-1 | SubnetRoutePacketInHandler | 360 - org.opendaylight.netvirt.vpnmanager-impl - 0.4.2.SNAPSHOT | Vpn interface tune09d81ac14f doesn't exist.
2017-09-22 16:03:06,772 | ERROR | pool-16-thread-1 | SubnetRoutePacketInHandler | 360 - org.opendaylight.netvirt.vpnmanager-impl - 0.4.2.SNAPSHOT | Vpn interface tune09d81ac14f doesn't exist.
2017-09-22 16:03:47,494 | INFO | pool-16-thread-1 | ArpUtils | 360 - org.opendaylight.netvirt.vpnmanager-impl - 0.4.2.SNAPSHOT | SubnetRoutePacketInHandler: sendArpRequest dpnId 225338375821825, actions [], groupId 210006, senderIPAddress 192.168.90.21, targetIPAddress 192.168.90.1
2017-09-22 16:03:47,564 | ERROR | pool-16-thread-1 | EvpnUtils | 368 - org.opendaylight.netvirt.elanmanager-impl - 0.4.2.SNAPSHOT | getEvpnRd : evpnName is NULL for elanInfo ElanInstance{getElanInstanceName=9143b3ce-1c14-4862-9afa-fad50ed780b8, getElanSegments=[], getElanTag=5003, getMacTimeout=300, getPhysicalNetworkName=physnet1, getSegmentType=class org.opendaylight.yang.gen.v1.urn.opendaylight.netvirt.elan.rev150602.SegmentTypeFlat, isExternal=true, augmentations={}}
2017-09-22 16:03:47,564 | ERROR | pool-16-thread-1 | EvpnUtils | 368 - org.opendaylight.netvirt.elanmanager-impl - 0.4.2.SNAPSHOT | advertisePrefix : rd is NULL for elanInfo ElanInstance{getElanInstanceName=9143b3ce-1c14-4862-9afa-fad50ed780b8, getElanSegments=[], getElanTag=5003, getMacTimeout=300, getPhysicalNetworkName=physnet1, getSegmentType=class org.opendaylight.yang.gen.v1.urn.opendaylight.netvirt.elan.rev150602.SegmentTypeFlat, isExternal=true, augmentations={}}, macAddress 00:0C:29:4D:9A:B

when we initiate ping from the VM instance (30.0.0.8) napt switch ,
ping succeed and I could see the dynamic flow installed in table 21. after the packets sent to controller from the flow
cookie=0x8000004, duration=1474.961s, table=22, n_packets=254, n_bytes=24892, priority=0 actions=CONTROLLER:65535

dynamic flow created "cookie=0x8000003, duration=2223.586s, table=21, n_packets=63, n_bytes=6174, priority=42,ip,metadata=0x30d42/0xfffffe,nw_dst=192.168.90.1 actions=set_field:00:0c:29:4d:9a:b5->eth_dst,load:0x400->NXM_NX_REG6[],resubmit(,220)"

case 2: initiate ping from Non NAPT switch after ping succeed from NAPT switch.

In this case also ping from Non NAPT failed.
packets reached NAPT switch and getting dropped in table 220.
cookie=0x8000007, duration=3619.781s, table=220, n_packets=59, n_bytes=5782, priority=10,reg6=0x400,metadata=0x1/0x1 actions=drop

attached case1 and case2 complete logs.

Comment by balakrishnan k [ 22/Sep/17 ]

Attachment compute1_napt_case1.zip has been added with description: compute node1 dump flow for case 1

Comment by balakrishnan k [ 22/Sep/17 ]

Attachment compute1_napt_case2.zip has been added with description: compute node1 dump flow for case 2

Comment by balakrishnan k [ 22/Sep/17 ]

Attachment compute2_non_napt_case1.zip has been added with description: compute node2 dump flow for case 1

Comment by balakrishnan k [ 22/Sep/17 ]

Attachment compute2_non_napt_case2.zip has been added with description: compute node2 dump flow for case 2

Comment by balakrishnan k [ 22/Sep/17 ]

Attachment control_node_case1.zip has been added with description: control node dump flow for case 1

Comment by balakrishnan k [ 22/Sep/17 ]

Attachment control_node_case2.zip has been added with description: controlnode dump flow for case 2

Comment by Aswin Suryanarayanan [ 22/Sep/17 ]

Is this 192.168.90.1 your gateway? If so, the table21 flow should be installed when the router g/w is set and not by the subnet route handler, if you had set the g/w while creating the subnet.

If it is a PNF you may be hitting [1]. are you working with the latest code?

[1]https://bugs.opendaylight.org/show_bug.cgi?id=8882

Comment by balakrishnan k [ 26/Sep/17 ]

(In reply to Aswin Suryanarayanan from comment #14)
> Is this 192.168.90.1 your gateway? If so, the table21 flow should be
> installed when the router g/w is set and not by the subnet route handler, if
> you had set the g/w while creating the subnet.
>
> If it is a PNF you may be hitting [1]. are you working with the latest code?
>
> [1]https://bugs.opendaylight.org/show_bug.cgi?id=8882

Hi Aswin,

192.168.90.1 is not gateway.
it is a external host we are trying to reach.
the code we used for testing is bit old taken on august mid stable/carbon release.

tried to evaluate the same in latest build[1] seems patch part auto creation is failing.

after executing below commands old code creating patch port between br-int and br-ex with new code patch port is not created automatically,
is there any open bug for patch port creation?

[1]https://nexus.opendaylight.org/content/repositories/opendaylight.snapshot/org/opendaylight/integration/distribution-karaf/0.6.2-SNAPSHOT/distribution-karaf-0.6.2-20170925.072222-788.tar.gz

Comment by balakrishnan k [ 27/Sep/17 ]

Aswin,
Tested with latest stable/carbon build still i am facing the same issue.
ping failed from non-NAPT switch.
used [1] build for testing does it have your fix [2]
only DNAT is working in below distribution.

[1]https://nexus.opendaylight.org/content/repositories/opendaylight.snapshot/org/opendaylight/integration/distribution-karaf/0.6.2-SNAPSHOT/distribution-karaf-0.6.2-20170926.152417-792.tar.gz

[2]https://bugs.opendaylight.org/show_bug.cgi?id=8882

I am using br-ex for external network communication
OVS show output:

c296d41b-bfe9-49a0-98b0-0e80d6303793
Manager "tcp:192.168.80.1:6640"
is_connected: true
Manager "ptcp:6641:127.0.0.1"
is_connected: true
Bridge br-int
Controller "tcp:192.168.80.1:6653"
is_connected: true
fail_mode: secure
Port br-int
Interface br-int
type: internal
Port "tapa8524892-a1"
Interface "tapa8524892-a1"
Port "tun4631a9da8ce"
Interface "tun4631a9da8ce"
type: vxlan
options:

{key=flow, local_ip="192.168.80.1", remote_ip="192.168.80.3"}

Port "tapebfc53aa-ab"
Interface "tapebfc53aa-ab"
Port br-ex-patch
Interface br-ex-patch
type: patch
options:

{peer=br-ex-int-patch}

Port "tun3811195a8d9"
Interface "tun3811195a8d9"
type: vxlan
options:

{key=flow, local_ip="192.168.80.1", remote_ip="192.168.80.2"}

Bridge br-ex
Port br-ex
Interface br-ex
type: internal
Port "ens36"
Interface "ens36"
Port br-ex-int-patch
Interface br-ex-int-patch
type: patch
options:

{peer=br-ex-patch}

ovs_version: "2.6.1"

Comment by Aswin Suryanarayanan [ 27/Sep/17 ]

Are you seeing any error logs ? Did you use conntrack/controller mode? Ping works only in conntrack mode.

Comment by YOGA LAKSHMI SWETHA PAYYAVULA [ 04/Oct/17 ]

Hi Ashwin,

We tested the SNAT scenario from Non-Napt switch in the conntrack mode, It is working fine. But we are yet to test in the controller mode,
We used the below distro for testing :

[1]https://nexus.opendaylight.org/content/repositories/opendaylight.snapshot/org/opendaylight/integration/distribution-karaf/0.6.2-SNAPSHOT/distribution-karaf-0.6.2-20170925.072222-788.tar.gz.

Thank you

Comment by YOGA LAKSHMI SWETHA PAYYAVULA [ 05/Oct/17 ]

Hi Ashwin,

We tested the SNAT scenario from Non-Napt switch in the conntrack mode, It is working fine. But we are yet to test in the controller mode,
We used the below distro for testing :

[1]https://nexus.opendaylight.org/content/repositories/opendaylight.snapshot/org/opendaylight/integration/distribution-karaf/0.6.2-SNAPSHOT/distribution-karaf-0.6.2-20170925.072222-788.tar.gz.

Thank you

Comment by YOGA LAKSHMI SWETHA PAYYAVULA [ 05/Oct/17 ]

Hi Ashwin,

We tested the SNAT scenario from Non-Napt switch in the conntrack mode, It is working fine. But we are yet to test in the controller mode,
We used the below distro for testing :

[1]https://nexus.opendaylight.org/content/repositories/opendaylight.snapshot/org/opendaylight/integration/distribution-karaf/0.6.2-SNAPSHOT/distribution-karaf-0.6.2-20170925.072222-788.tar.gz.

Thank you

Comment by YOGA LAKSHMI SWETHA PAYYAVULA [ 05/Oct/17 ]

Hi Ashwin,

We tested in the Controller mode, SNAT is working fine from a Non-Napt Switch node.We used the same distro (as mentioned in the last comment). Everything is working fine.

Thank you

Generated at Wed Feb 07 20:22:41 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.