[OVSDB-198] Unstack and restack existing same compute unable to communicate the existing network VM's Created: 27/Aug/15  Updated: 11/Oct/15  Resolved: 11/Oct/15

Status: Resolved
Project: ovsdb
Component/s: openstack.net-virt
Affects Version/s: unspecified
Fix Version/s: None

Type: Bug
Reporter: Vasanthan Balasubramaniyan Assignee: Ravi Shankar S
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 4208

 Description   

Build#: Lithium-SR1 Build
Openstack: stable/kilo

local.conf snip
=======
enable_plugin networking-odl https://github.com/stackforge/networking-odl stable/kilo
Q_PLUGIN=ml2
ENABLE_TENANT_TUNNELS=True
Q_ML2_TENANT_NETWORK_TYPE=vxlan

Environment
============
Openstack controller & compute node connected with ODL as external mode.

Problem:
========
1.Create network on 2.1.1.0/24
2.Spawned VM1 on compute node 1,VM2 on compute node 2 (with 2.1.1.x network)
3.Unstacked the compute node2
4.Restacked the compute node2
5.Spwaned VM3 on compute node 2 (with 2.1.1.x network)
6.VM3 (on compute node 2) unable to ping VM1 (on compute node 1)
7.Spawned VM4 on compute node 1 (with 2.1.1.x network)
8.VM4(on compute node 1) will able ping VM3 (on compute node 2)
9.VM4(on compute node 1) will able ping VM4 (on compute node 1)

Additional Observation
=====================
Bridge br-int doesnot removed the "vxlan-20.1.1.3",when unstacking the compute node.
============SNIP of sudo ovs-vsctl show===================
Bridge br-int
Controller "tcp:10.16.148.232:6653"
is_connected: true"vxlan-20.1.1.3"
fail_mode: secure
Port "vxlan-20.1.1.3"
Interface "vxlan-20.1.1.3"
type: vxlan
options:

{key=flow, local_ip="20.1.1.2", remote_ip="20.1.1.3"}

=======================================

This stale entries doesn't retained until restack openstack other nodes.



 Comments   
Comment by Ravi Shankar S [ 03/Sep/15 ]

When unstack.sh is executed it removes all the ports, bridges and flows in the corresponding node. The ODL does not store any of these flows and there is no way to recover the old flows. When the same node is re-stacked, only the default flows corresponding to the vxlan tunnel is programmed.

Root cause:

The below missing flow in the re-stacked node is the root cause for the issue.

Unicast Packet flow to a known

{tun_id, mac}

tuple:
For a known tunnel id and mac, the unicast packets need to be send through the vxlan tunnel.

For eg.,
cookie=0x0, duration=781.112s, table=110, n_packets=6, n_bytes=532, tun_id=0x3f4,dl_dst=fa:16:3e:cf:51:7f actions=output:2

Why ping to the old VM does not work:

The ARP broadcast message from the re-stacked node's vm, reaches the old VM and old vm sends a reply. The MAC is learnt in the re-stacked node's VM. When it tries to send ICMP packets (unicast packet for known MAC & TUN ID) to the odl VM, the packet is not send through the vxlan port because of the above missing flow.

This bug cannot be addressed in ODL with the current ovsdb design.

Comment by Ravi Shankar S [ 07/Sep/15 ]

In the new compute node, when I try to add a VM only the neutron port addition for that VM is logged. The neutron port southbound events for older VM's are missing. The ODL does not get events from southbound for the older VM's and is why it is not able to program the compute node2.

The missing rules are added in the compute node, only when I hard/soft reboot VM1 and VM2 from openstack UI (Now the ping to VM1/VM2 is working). When I do this hard/soft reboot, karaf log shows PORT DELETE and PORT ADD southbound events for the neutron port of older VM's.

To fix this issue
1. neutron port update southbound event should reach ODL for older VM's. (or)
2. ODL should have a cache of all the older VM’s to program new compute node. This is not the current design in ODL.
i. If ODL itself is restarted and then a new compute node is added is another issue. In this case the cache approach will not help.
ii. If it is clustering scenario we can maintain these details in shard. In scalability aspect maintain cache will be overhead to ODL.

Comment by Ravi Shankar S [ 01/Oct/15 ]

The fix got merged to the stable/lithium
https://git.opendaylight.org/gerrit/#/c/27515

Comment by Sam Hague [ 11/Oct/15 ]

https://git.opendaylight.org/gerrit/#/c/27515

Generated at Wed Feb 07 20:35:46 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.