[NETVIRT-738] Error occurs when adding a new L2GW Node to the environment which L2GW is running with Open vSwitch HWVTEP Emulator Created: 19/Jun/17  Updated: 05/Apr/18  Resolved: 05/Apr/18

Status: Resolved
Project: netvirt
Component/s: General
Affects Version/s: Carbon
Fix Version/s: None

Type: Bug Priority: Medium
Reporter: Ran Xiao Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 8712

 Description   

What we want to do:
Add L2GW Nodes to the environment which already have one or more L2GW nodes working.

What we did (Steps):
1. Create a L2GW Node with Open vSwitch HWVTEP Emulator
Set 'other_config:ha_enabled=true' and 'other_config:ha_id=0123456789' when creating.
2. Create a gateway and connection
Confirmed the comminocation via L2GW Node was OK
3. Create a new L2GW Node with Open vSwitch HWVTEP Emulator
Create a HA Cluster with node created in step1
Set 'other_config:ha_enabled=true', and 'other_config:ha_id=0123456789'
VTEP and MAC were also set to the same vaule of the first node

Environment details:
OpenStack Version:stable/ocata
ODL Version:Carbon-FR + patch
patch: https://git.opendaylight.org/gerrit/#/c/56773/
https://git.opendaylight.org/gerrit/#/c/56710/
HWVTEP: Open vSwitch 2.6.1 HWVTEP Emulator
: HA Cluster

karaf.log:
2017-06-14 14:14:35,554 | ERROR | on-dispatcher-66 | TransactInvokerImpl | 292 - org.opendaylight.ovsdb.hwvtepsouthbound-impl - 1.4.1.SNAPSHOT | HWVTEP transaction operation failed referential integrity violation Table Physical_Port column vlan_bindings row d400dc78-d0b2-4a45-beba-326b18eb4d59 references nonexistent row cc7cbc81-eb4d-4f2b-ba04-9dfa6afe849d in table Logical_Switch.
2017-06-14 14:14:35,555 | ERROR | on-dispatcher-66 | TransactInvokerImpl | 292 - org.opendaylight.ovsdb.hwvtepsouthbound-impl - 1.4.1.SNAPSHOT | Failed transaction update [tunnel_ips : [] , ]update [vlan_bindings :

{2222=LogicalSwitch_30ae2d8c_5361_45be_b7a8_a93abf722e8f} , ]
2017-06-14 14:14:36,047 | ERROR | on-dispatcher-66 | TransactInvokerImpl | 292 - org.opendaylight.ovsdb.hwvtepsouthbound-impl - 1.4.1.SNAPSHOT | HWVTEP transaction operation failed referential integrity violation Table Ucast_Macs_Remote column locator row 17ddb12a-f4de-487d-908f-1717790100f1 references nonexistent row 00000000-0000-0000-0000-000000000000 in table Physical_Locator.
2017-06-14 14:14:36,047 | ERROR | on-dispatcher-66 | TransactInvokerImpl | 292 - org.opendaylight.ovsdb.hwvtepsouthbound-impl - 1.4.1.SNAPSHOT | Failed transaction insert [name : 30ae2d8c-5361-45be-b7a8-a93abf722e8f , tunnel_key : [96] , ] insert [encapsulation_type : vxlan_over_ipv4 , dst_ip : 10.0.0.10 , ] insert [locators : [PhysicalLocator_Random_61098e5741e640c6a118896b0c87e1c9] , ] insert [locator_set : PhysicalLocatorSet_Random_49a5353230ee467c8a8254689b15a162 , logical_switch : LogicalSwitch_30ae2d8c_5361_45be_b7a8_a93abf722e8f , MAC : unknown-dst , ] insert [logical_switch : LogicalSwitch_30ae2d8c_5361_45be_b7a8_a93abf722e8f , MAC : fa:16:3e:fa:1d:10 , ]

Cause of failure:
The operation order of OVSDB is incorrect.
The uuid of LogicalSwitch_30ae2d8c_5361_45be_b7a8_a93abf722e8f will not be assigned
if update [vlan_bindings : {2222=LogicalSwitch_30ae2d8c_5361_45be_b7a8_a93abf722e8f}

, ]
is not executed after Logical Switch creation with
operation of insert [name : 30ae2d8c-5361-45be-b7a8-a93abf722e8f , tunnel_key : [96] , ]

Below are commands we executed.
---- HWVTEP#1/HWVTEP#2 ----
vtep=10.0.0.50
BRIDGE=ocata-l2gw1
HA_ID=0123456789
ovsdb-server --pidfile --detach --log-file --remote=punix:/var/run/openvswitch/db.sock --remote=db:hardware_vtep,Global,managers --remote=ptcp:6632 /etc/openvswitch/ovs.db /etc/openvswitch/vtep.db
ovs-vswitchd --log-file --detach --pidfile unix:/var/run/openvswitch/db.sock
ovs-vsctl add-br $BRIDGE
ovs-vsctl add-port $BRIDGE eth2
vtep-ctl add-ps $BRIDGE
vtep-ctl set Physical_Switch $BRIDGE tunnel_ips=$vtep
/usr/share/openvswitch/scripts/ovs-vtep --log-file=/var/log/openvswitch/ovs-vtep.log --pidfile=/var/run/openvswitch/ovs-vtep.pid --detach $BRIDGE
vtep-ctl set-manager tcp:172.16.1.30:6640
MANAGERUUID=$(vtep-ctl list Manager | grep ^_uuid | awk -F: '

{print $2}

' | tr -d ' ')
vtep-ctl set Manager $MANAGERUUID other_config:ha_enabled=true
vtep-ctl set Manager $MANAGERUUID other_config:ha_id=$HA_ID
vtep-ctl list Manager
---------------------------



 Comments   
Comment by Ran Xiao [ 22/Jun/17 ]

We are considering how to fix it now.
Hope to get some advice about which product should we fix. ovsdb or netvirt?

Cause of the bug:
When HWVTEP Node is added, netvirt use 'GlobalAugmentationMerger' to merge and update 'RemoteMcastCmd,RemoteUcastCmd,LocalUcastCmd,LocalMcastCmd,LogicalSwichesCmd,SwichesCmd' to MD-SAL at one time.
This trigger ovsdb to write data into OVS ,but without conscious of operation order.
Then error occur caused by the incorrect operation order.

Proposal 1: fix in netvirt
Only merge LogicalSwichesCmd at first and write to MD-SAL
Wait till ovsdb begin to process the updates in MD-SAL
Then merge other data and write to MD-SAL

Proposal 2: fix in ovsdb
Check whether LogicalSwitch data has been written,
if no, do not write other data into OVS
if yes, then write other data into OVS

Comment by Ran Xiao [ 28/Jun/17 ]

We chose 'Proposal 1: fix in netvirt' to fix this bug.
And we have pushed the code.
Please help to review the changes.
https://git.opendaylight.org/gerrit/59598

Comment by suneel verma [ 05/Sep/17 ]

Review in progress for plugin
https://git.opendaylight.org/gerrit/#/c/61606/

Generated at Wed Feb 07 20:22:20 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.