[NETVIRT-488] Openflow DHCP rules not installed with OVS DPDK on controller node Created: 21/Feb/17  Updated: 29/May/18  Resolved: 24/Mar/17

Status: Resolved
Project: netvirt
Component/s: General
Affects Version/s: Boron
Fix Version/s: None

Type: Bug
Reporter: Tim Rozet Assignee: Unassigned
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Zip Archive logs_output_repro_steps.zip     Text File port_security_info.txt    
External issue ID: 7835

 Description   

My setup is 2 compute nodes, and 1 control node. The compute nodes both have dpdk ports, with a patch port from br-int to br-phy, using vxlan as the tenant network type. The control node has no dpdk port, and is using a regular ethernet interface to egress vxlan traffic.

The issue is when an instance is created, the flows that should be in table 40 on the controller to allow dhcp are missing. The flows however are installed on the compute node, and vxlan tunnel is created. The instance comes up fine with a vhostuser port and tries to dhcp.

I do not see any errors in the log indicating failure to install flows, so I'm not sure if nevirt ever tried to even install them.

This is with OVS 2.6 and dpdk 16.11.

Will attach karaf log and ovs outputs.



 Comments   
Comment by Tim Rozet [ 21/Feb/17 ]

Attachment logs_output_repro_steps.zip has been added with description: Contains ovs outputs, karaf log, and steps to reproduce

Comment by Tim Rozet [ 21/Feb/17 ]

It looks like the problem is the tap port from the DHCP NS is down:

[root@overcloud-controller-0 hieradata]# ovs-ofctl -O openflow13 show br-int
OFPT_FEATURES_REPLY (OF1.3) (xid=0x2): dpid:0000dc5cf7654e1b
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS GROUP_STATS QUEUE_STATS
OFPST_PORT_DESC reply (OF1.3) (xid=0x3):
1(br-ex-patch): addr:1e:23:b7:d3:55:c6
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
2(tapc61135c5-ba): addr:00:00:00:00:70:bb
config: PORT_DOWN
state: LINK_DOWN
speed: 0 Mbps now, 0 Mbps max
3(tund547d93ae28): addr:8a:80:0d:a0:46:98
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
LOCAL(br-int): addr:dc:5c:f7:65:4e:1b
config: PORT_DOWN
state: LINK_DOWN
speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (OF1.3) (xid=0x5): frags=normal miss_send_len=0

2017-02-21T09:23:16.607Z|00035|bridge|INFO|bridge br-int: added interface tapc61135c5-ba on port 2
2017-02-21T09:23:16.765Z|00036|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on tapc61135c5-ba device failed: No such device
2017-02-21T09:23:16.773Z|00037|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on tapc61135c5-ba device failed: No such device
2017-02-21T09:23:16.774Z|00038|netdev_linux|WARN|tapc61135c5-ba: removing policing failed: No such device

I'm not sure why it is trying to add it as a netdev device.

Comment by Tim Rozet [ 21/Feb/17 ]

Looking at the DHCP agent Neutron code and OVS driver there, it does not add any config to put the bridge into netdev mode. It simply uses vsctl and adds a port to the bridge, which is by default being added as netdev. Therefore my theory is that ODL is putting the bridge into netdev mode, which it shouldn't when DPDK is not enabled on that openvswitch instance. Can an ODL dev confirm that ODL puts the switches into netdev mode?

Comment by Josh Hershberg [ 23/Feb/17 ]

Table 40 is the INGRESS_ACL_TABLE

Comment by Josh Hershberg [ 23/Feb/17 ]

I'm not certain it's related to the dhcp port being down as you indicate below. For e.g., it's always down in my dev environments and it works just fine:
2(tap2278122f-1e): addr:7f:74:00:00:00:00
config: PORT_DOWN
state: LINK_DOWN
speed: 0 Mbps now, 0 Mbps max

I did notice that aside from the patch to br-ex br-int also has the auto-configured tun interfaces. There was a problem with this recently where it interfered with DHCP. Can you try this with a version that has the following two patches merged?

https://code.engineering.redhat.com/gerrit/#/c/97855/
https://code.engineering.redhat.com/gerrit/#/c/97729

Comment by Josh Hershberg [ 23/Feb/17 ]

Also, it does not look like the bridge is in netdev:

[heat-admin@overcloud-controller-0 ~]$ sudo ovs-vsctl get Bridge d7199e75-edcb-484a-ab77-4695e233100b name
br-int
[heat-admin@overcloud-controller-0 ~]$ sudo ovs-vsctl get Bridge d7199e75-edcb-484a-ab77-4695e233100b datapath_type
""

Which is weird. Plus:
[heat-admin@overcloud-controller-0 ~]$ sudo ovs-appctl dpctl/dump-dps
system@ovs-system

Whereas on the compute you'd get:
[heat-admin@overcloud-novacompute-1 ~]$ sudo ovs-appctl dpctl/dump-dps
netdev@ovs-netdev
system@ovs-system

So it really looks like netdev is not active on the controller

Comment by Tim Rozet [ 23/Feb/17 ]

OK I saw netdev logging msgs and assumed it was in netdev, but you're right it is not. So I manually tried to create a tap port and attach to the namespace and it attaches but link state will not come up in OVS. I think the issue is the 2.6 OVS I have is using the wrong kmod, let me figure that out and then will report back here.

Comment by Tim Rozet [ 25/Feb/17 ]

Nevermind, it looks like even on other setups that work the port state is always down in OVS:

13(tap738a631a-f4): addr:00:00:00:00:f0:02
config: PORT_DOWN
state: LINK_DOWN
speed: 0 Mbps now, 0 Mbps max

and DHCP works in that setup. Must be something broken with the flows.

Comment by Shashidhar R [ 02/Mar/17 ]

It looks like neutron port on controller node is created with port_security_enabled=false.

To debug further, please provide below details:

From openstack:

a. Neutron ports detail
b. neutron security-group-list
c. neutron security-rule-list
d. neutron security-group-show default

From ODL(Rest call outputs):

a. http://<controller_ip>:8181/restconf/config/neutron:neutron/security-groups/
b. http://<controller_ip>:8181/restconf/config/neutron:neutron/security-rules/
c. http://<controller_ip>:8181/restconf/config/neutron:neutron/ports/
d. http://controllerHost:8181/restconf/config/ietf-interfaces:interfaces/
e. http://controllerHost:8181/restconf/operational/ietf-interfaces:interfaces-state/

Comment by Tim Rozet [ 02/Mar/17 ]

(In reply to Shashidhar R from comment #9)
Attached the requested info as port_security_info.txt.

From what I can see the port security in neutron is set to false for a dhcp port, but is set to true for the nova instance. This looks normal when comparing it to another setup.

However, in Neutron northbound the port security is set to true for some reason:
neutron-binding:vif-type":"ovs","device-id":"dhcp827da361-9c56-50f7-913f-5a01f7bfed2c-b6a4a0c3-2ec0-45da-954b-05ae44d6c782","tenant-id":"91a9b66d-c9cf-46a4-ae15-34abff12e786","mac-address":"fa:16:3e:60:b7:e9","neutron-portsecurity:port-security-enabled":true}

Also, I see in the oper:
"tapc61135c5-ba","odl-interface:l2vlan-mode":"trunk","type":"iana-if-type:l2vlan","enabled":true}]

Is it supposed to be a type l2vlan in trunk mode?

Note, this setup has been up for some time now, and I think I restarted ODL a few times. It should resync with neutron, but just a caveat the setup may not be 100% in the same original state anymore.

Comment by Tim Rozet [ 02/Mar/17 ]

Attachment port_security_info.txt has been added with description: Requested outputs for security groups

Comment by Shashidhar R [ 06/Mar/17 ]

Neutron north bound issue related to port_security_enabled=false issue is resolved by https://git.opendaylight.org/gerrit/#/c/52267/.

Also, I observed that of port connected to VM is down in control node. Is this being done intentionally to verify some usecase? If not, can you verify this again by making this port UP?

Along with above fix, few other fixes are in netvirt and genius projects. Please verify this usecase again with latest build.

Comment by Tim Rozet [ 06/Mar/17 ]

(In reply to Shashidhar R from comment #12)
> Neutron north bound issue related to port_security_enabled=false issue is
> resolved by https://git.opendaylight.org/gerrit/#/c/52267/.
>
> Also, I observed that of port connected to VM is down in control node. Is
> this being done intentionally to verify some usecase? If not, can you verify
> this again by making this port UP?
>
> Along with above fix, few other fixes are in netvirt and genius projects.
> Please verify this usecase again with latest build.

Which port are you referring to? There is no VM on the control node, only namespaces. I will retry with newer build.

Comment by Shashidhar R [ 14/Mar/17 ]

I was referring to below o/p from "ovs-ofctl -O openflow13 show br-int" command on CONTROL NODE

2(tapc61135c5-ba): addr:00:00:00:00:70:bb
config: PORT_DOWN
state: LINK_DOWN

But, this should not have any problems configuring flows in table 40 as it's been discussed here.

Is this issue still been observed in the latest build?

Comment by Tim Rozet [ 24/Mar/17 ]

This was not a bug and was a problem with iptables blocking the dhcp request to the controller. After fixing that everything works fine.

Generated at Wed Feb 07 20:21:41 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.