[OVSDB-139] OVSDB needs to be more proactive in reporting errors with underlying OVS instances Created: 13/Feb/15  Updated: 19/Oct/17  Resolved: 08/Jun/16

Status: Resolved
Project: ovsdb
Component/s: openstack.net-virt
Affects Version/s: unspecified
Fix Version/s: None

Type: Improvement
Reporter: Tim Rozet Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: Linux
Platform: All


Attachments: Zip Archive bug_2714.zip    

 Description   

When using OpenStack with ODL integration, Neutron will send REST API calls to ODL for different network creation/configuration requests. As of right now OVSDB will always return 200 OK when asked to provision underlying VXLAN tunnels, flows, etc to the OVS instances below it. There should be some checking during the Neutron call to ensure to the best of ODL's ability that the network will be able to be provisioned.

In the case of a tenant instance coming up, ODL relies on two pieces of information: Neutron call and nova adding the tap port to the OVS instance on the compute side. As of right now there is discussion on how ODL can handle error reporting for this case where there is a synchronous expectaion from Neutron and an asynchronous underlying operation.

However, in the case of bridges (br-int to be more specific), if OVSDB receives a request to provision bridges that either do not exist or have gone down, OVSDB should be proactive enough to tell Neutron that there is an error and it cannot configure the bridge correctly.

Right now the behavior is as follows:
1. Working setup with topology as one compute node, and one controller/network node where there is an OVS instance on each node.
2. Remove br-int on control/network node.
3. In Neutron add entwork, subnet, router. All gives 200 OK.
4. Now bring up a tenant instance. ODL will send back 200 OK and provision one end of the VXLAN tunnel and send flows to the br-int on compute.
5. Other side of the tunnel is completly missing since br-int is not there. This should be caught by ODL and reported to Neutron as an error.

Caveat here is removing br-int on compute will cause nova to actually report an error since the tenant cannot attach the tap port to the br-int so that case is covered, but ODL should still also report an error there.

Please see attached Neutron and karaf logs.



 Comments   
Comment by Tim Rozet [ 13/Feb/15 ]

Attachment bug_2714.zip has been added with description: contains logs

Comment by Tim Rozet [ 13/Feb/15 ]

Version is Helium SR2 3.0.1

Comment by Sam Hague [ 22/Apr/16 ]

ODL shouldn't need to report the condition in 5 as far as tunnels are concerned:

5. Other side of the tunnel is completly missing since br-int is not there. This should be caught by ODL and reported to Neutron as an error.

If br-int is missing, then ODL has no way to know that is an error and works from what it knows at the time the neutron network was created and when the vm's are instantiated.

When the neutron network/subnet create is called, ODL will simply provision the only existing br-int on the compute node. Same for the vm instances.

But, if a neutron port request comes in from a VM that is on the control/compute node with the deleted br-int, and that is coming back with a 200 OK, then I could see that as an issue. ODL Netvirt will likely silently drop the request since there is no bridge and no way to know what to do with that port. We still don't have a way to report the condition though since the northbound and southbound are asynchronous.

Comment by Tim Rozet [ 22/Apr/16 ]

This bug is old, but the main points that should be ensured are these:

1. If an OVS which is connected to ODL, loses it's br-int/flows/conf.d and reconnects to ODL. OVSDB should re-create br-int and push back missing flows (ovsdb + openflowplugin reconciliation).

2. If a vxlan neturon network exists and is rendered by ODL with 2 ports on 2 different nodes, ODL should be smart enough to ensure that both have vxlan tunnel connectivity. If that is missing the L2 network is broken. Reconciliation should happen here.

Comment by Sam Hague [ 08/Jun/16 ]

be: https://git.opendaylight.org/gerrit/#/c/40029/
b: https://git.opendaylight.org/gerrit/#/c/39762/2

Generated at Wed Feb 07 20:35:36 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.