[NETVIRT-30] Datapath_id potentially changes as ports are added to bridge Created: 15/Jun/16  Updated: 29/May/18  Resolved: 26/Jul/16

Status: Resolved
Project: netvirt
Component/s: None
Affects Version/s: Beryllium
Fix Version/s: None

Type: Bug
Reporter: Josh Hershberg Assignee: Josh Hershberg
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 6070

 Description   

OVS sets the datapath_id using the following algorithm:

if other_config:datapath_id is specified
use that
otherwise if other_config:hwaddr specified
use that
otherwise if this bridge has ports
use the lowest mac address of the ports on this bridge
otherwise
use a randomly generated mac

this algorithm is re-run every time you add or remove a port from the bridge. So, if a port is added with a lower mac, the dpid will change mid-flight. As you can imagine this should really screw up the state, especially WRT operational vs. configuration. During tests it was observed that when this occurs flows are not properly installed in OVS.



 Comments   
Comment by Josh Hershberg [ 15/Jun/16 ]

Text of email thread on this subject:

On Wed, Jun 15, 2016 at 10:00 AM, Josh Hershberg <jhershbe@redhat.com> wrote:

inline...

On Wed, Jun 15, 2016 at 3:42 PM, Sam Hague <shague@redhat.com> wrote:

On Wed, Jun 15, 2016 at 9:27 AM, Josh Hershberg <jhershbe@redhat.com> wrote:

I​ hadn't noticed that there's a a way to directly set the datapath_id so in my tests I had been setting by setting the hwaddr of br-int. I guess it is more straightforward to set it directly.

The only sticky way to keep dpid is to use the other-config. Isn't that what you are doing? Or are you setting the hwaddr on the ip link for br-int?

​I was setting the hwaddr in other_config together with creating the bridge and the dpid is based on that. There is also the option of setting the datapath_id directly. I'll do that one in the fix I'll push.

yeah, I found conflicting methods - use hwadrr or datapath-id. I use hwaddr and I think that is what is used in the code and in the ovs-vsctl man page it says that only hwaddr is kept on db resets.

So, here's the summary with a few questions peppered in there:

1) By default, we set the dpid together with creating br-int in a single transaction.
2) That is override-able via some configuration option. Can you please provide a little more info about where this option should go? I assume in a config file or something?

Yeah, add it with our other config options. Oded is adding a value for the dhcp enable in the new netvirt. We already have some in the netvirtproviders-config of the old netvirt. It is dead simple to add the config. Just add a leaf to the yang and call the new generated method to get the value and store it in the class. it will need to be migrated to blueprint, but since adding this code is easy I don't think it matters.


OK. Will ask Oded. If using old netvirt, which I guess you are, then yeah, just copy what you see for the table-offset. That is what Oded used for his example.

Also, I assume this is serious enough that it should have a bug opened and its own commit.

Yeah, that is good so we can track it and look it up.

Thanks for all the time, man.

No, thank you for getting this. No idea how this never pops up. I guess as libvirt/neutron is adding vifs to the bridge it is getting lucky that it is a higher value.
Actually, what is the metric for lower port - is it looking at the mac or is it looking at how the port is numbered in the stack - like is the port numbered in the order it is created?

​It's a memcmp of the bytes of the MAC address.​

On Wed, Jun 15, 2016 at 3:02 PM, Sam Hague <shague@redhat.com> wrote:

On Wed, Jun 15, 2016 at 8:03 AM, Josh Hershberg <jhershbe@redhat.com> wrote:

Why does it need to be unique across all nodes?

Here's what ovs itself does when it doesn't have any config or ports:
3182 /* Derive the default Ethernet address from the bridge's UUID. This should
3183 * be unique and it will be stable between ovs-vswitchd runs. */
3184 memcpy(&br->default_ea, &br_cfg->header_.uuid, ETH_ADDR_LEN);
3185 eth_addr_mark_random(&br->default_ea);
This function ^^^ just marks the two low bits of the first byte to 0 (multicast and private) so functionally it's just a 46 bit random value. They use their UUID because it stays constant between restarts which is not something I'm sure we care about especially since up until now the dpid and mac will change from the uuid derivative to the value of the lowest port mac, which is certainly not so unique. From all this it sounds to me like if we just generate our own random mac it should work.

agreed, so we rely on the typical mac uniqueness in this case and get the value for free. The problem is if the deployment or installer wants to set the br-int itself - I wasn't sure if we should override the value. Guess we could make it configurable. By default, ODL sets the value, otherwise keep what is given.

I did find this from a an older nec pugin where they were setting the dpid via the mac: https://github.com/mlowery/bug1280915/blob/master/lib/neutron_plugins/nec

Also, it would seem to me that if we wait for the update when the dpid comes in we could theoretically have a race condition where if someone added a port before we wrote the hwaddr stuff could break, no?

Yeah, that was the one main concern. I think the window is small. And you still ahve the issue if someone created br-int already. But I think we have the same issue even if ODL sets the dpid since the add-br with other-config is a two step process isn't it? Normally you do the add-br br-int – set Bridge br-int other-config xxxx via ovs-vsctl which might bunch it, but I am not sure if our southbound works the same way. Guess it is an easy test to verify.

On Wed, Jun 15, 2016 at 1:47 PM, Sam Hague <shague@redhat.com> wrote:

Just avoiding having to come up with a numbering scheme that would be unique across all nodes.
On Jun 15, 2016 7:45 AM, "Josh Hershberg" <jhershbe@redhat.com> wrote:

How is that better than just setting it manually when we create the bridge?

On Wed, Jun 15, 2016 at 1:44 PM, Sam Hague <shague@redhat.com> wrote:

What if right after we see the bridge connect we set the datapath id to the value seen?
On Jun 15, 2016 7:08 AM, "Josh Hershberg" <jhershbe@redhat.com> wrote:

First created it with "ip link", then set the mac, then SouthboundUtils.addTerminationPoint

On Wed, Jun 15, 2016 at 1:05 PM, Sam Hague <shague@redhat.com> wrote:

Josh,

How did you add the new port with decreased Mac?

Thanks, Sam
On Jun 15, 2016 3:46 AM, "Josh Hershberg" <jhershbe@redhat.com> wrote:

Adding Anil...

Anil, the background here is that in the process of developing the IT infra stuff I noticed that OVS sets the datapath_id using the following algorithm:
if other_config:hwaddr specified
use that
otherwise if this bridge has ports
use the lowest mac address of the ports on this bridge
otherwise
use a randomly generated mac

What's more, this algorithm is re-run every time you add or remove a port from the bridge. So, if a port is added with a lower mac, the dpid will change mid-flight. As you can imagine this should really screw up the state, especially WRT operational vs. configuration. Sam and I were discussing this yesterday and I had said I would validate that my understanding of this problem is correct. That's what the email below is.

On Wed, Jun 15, 2016 at 9:40 AM, Josh Hershberg <jhershbe@redhat.com> wrote:

So...I modified my IT test code so that it would add ports with MACs that sequentially decrease and the dpid changes multiple times. Here's some log lines. I generate [1] them in between each addition of a port:
JOSH - REFRESH old 244431472159045, new 268280838160388
JOSH - REFRESH old 268280838160388, new 268280838160387
JOSH - REFRESH old 268280838160387, new 268280838160386

And...the flows are screwed up [2]. I print them from inside the test after waiting three seconds after adding the three ports.

This is the behavior I expected based on my understanding of the OVS sources so i am not pretty convinced. I really don't know how to explain how this has never surfaced before. As I mentioned last night I suspect that generally MACs are probably generated in ascending order which would probably mask this to some extent. Anyway, I think the easiest way around this is to always set br-int's mac using other_config:hwaddr. Is that safe? Could it break stuff?

I had an idea to use the connection info to generate a MAC. The IP is four bytes and the port is two for a total of 6 == the length of a mac address. However, this is not perfect since in reality a MAC address is 6 bytes minus the multicast bit which must be zero, so in theory you could get a collision there but thought I would share the idea in the spirit of brainstorming. Another option, obviously is that we can just start assign it based on an internal counter of some sort.

[1] Here's the code that prints it:

public void refresh() throws InterruptedException {
long oldDpid = datapathId;
Thread.sleep(3000);

ovsdbNode = itUtils.southboundUtils.getOvsdbNode(connectionInfo);
bridgeNode = itUtils.southboundUtils.getBridgeNode(ovsdbNode, INTEGRATION_BRIDGE_NAME);
datapathId = itUtils.southboundUtils.getDataPathId(bridgeNode);

LOG.warn("JOSH - REFRESH old {}, new {}", oldDpid, datapathId);
}

[2]

2016-06-15 08:36:53,818 | INFO | ion(3)-10.0.0.17 | ProcUtils | 303 - org.opendaylight.ovsdb.utils.ovsdb-it-utils - 1.3.0.SNAPSHOT | ProcUtils.runProcess running "[sudo, docker-compose, -f, /tmp/ovsdb-it-tmp-1430782894565815502.tmp, exec, ovs, ovs-ofctl, -OOpenFlow13, dump-flows, br-int]", waitFor 5000
OFPST_FLOW reply (OF1.3) (xid=0x2):
cookie=0x0, duration=19.181s, table=0, n_packets=0, n_bytes=0, dl_type=0x88cc actions=CONTROLLER:65535
cookie=0x0, duration=19.181s, table=0, n_packets=0, n_bytes=0, priority=0 actions=goto_table:20
cookie=0x0, duration=8.685s, table=20, n_packets=0, n_bytes=0, priority=1024,arp,tun_id=0x65,arp_tpa=10.0.0.3,arp_op=1 actions=move:NXM_OF_ETH_SRC[]>NXM_OF_ETH_DST[],set_field:f4:00:00:0f:00:02>eth_src,load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]>NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]>NXM_OF_ARP_TPA[],load:0xf400000f0002->NXM_NX_ARP_SHA[],load:0xa000003->NXM_OF_ARP_SPA[],IN_PORT
cookie=0x0, duration=8.668s, table=20, n_packets=0, n_bytes=0, priority=1024,arp,tun_id=0x65,arp_tpa=10.0.0.1,arp_op=1 actions=move:NXM_OF_ETH_SRC[]>NXM_OF_ETH_DST[],set_field:f4:00:00:0f:00:04>eth_src,load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]>NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]>NXM_OF_ARP_TPA[],load:0xf400000f0004->NXM_NX_ARP_SHA[],load:0xa000001->NXM_OF_ARP_SPA[],IN_PORT
cookie=0x0, duration=8.651s, table=20, n_packets=0, n_bytes=0, priority=1024,arp,tun_id=0x65,arp_tpa=10.0.0.2,arp_op=1 actions=move:NXM_OF_ETH_SRC[]>NXM_OF_ETH_DST[],set_field:f4:00:00:0f:00:03>eth_src,load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]>NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]>NXM_OF_ARP_TPA[],load:0xf400000f0003->NXM_NX_ARP_SHA[],load:0xa000002->NXM_OF_ARP_SPA[],IN_PORT
cookie=0x0, duration=19.203s, table=20, n_packets=0, n_bytes=0, priority=0 actions=goto_table:30
cookie=0x0, duration=19.199s, table=30, n_packets=0, n_bytes=0, priority=0 actions=goto_table:40
cookie=0x0, duration=19.199s, table=40, n_packets=0, n_bytes=0, priority=0 actions=goto_table:50
cookie=0x0, duration=19.181s, table=50, n_packets=0, n_bytes=0, priority=0 actions=goto_table:60
cookie=0x0, duration=19.181s, table=60, n_packets=0, n_bytes=0, priority=0 actions=goto_table:70
cookie=0x0, duration=19.203s, table=70, n_packets=0, n_bytes=0, priority=0 actions=goto_table:80
cookie=0x0, duration=19.181s, table=80, n_packets=0, n_bytes=0, priority=0 actions=goto_table:90
cookie=0x0, duration=19.203s, table=90, n_packets=0, n_bytes=0, priority=0 actions=goto_table:100
cookie=0x0, duration=19.203s, table=100, n_packets=0, n_bytes=0, priority=0 actions=goto_table:110
cookie=0x0, duration=19.199s, table=110, n_packets=0, n_bytes=0, priority=0 actions=drop

Comment by Anil Vishnoi [ 15/Jun/16 ]

Josh,

Do you see connection to controller gets disconnected when the datapath_id changes? If not, i believe this is a bug with OVS code that should be fixed. Because openflow protocol, discover datapath id when switch connects to the controller, and if switch changes it datapath id without notifying the controller or disconnecting/re-connection, there is no way for controller to know the new datapath-id. And if switch keep changing the datapath-id that will create lot of issues. I don't really see any good technical reason why OVS will do that.

Comment by Josh Hershberg [ 16/Jun/16 ]

(In reply to Anil Vishnoi from comment #2)
> Josh,
>
> Do you see connection to controller gets disconnected when the datapath_id
> changes? If not, i believe this is a bug with OVS code that should be fixed.
> Because openflow protocol, discover datapath id when switch connects to the
> controller, and if switch changes it datapath id without notifying the
> controller or disconnecting/re-connection, there is no way for controller to
> know the new datapath-id. And if switch keep changing the datapath-id that
> will create lot of issues. I don't really see any good technical reason why
> OVS will do that.

There's a notification of the change just like the initial notification where you get the dpid the first time but I did not see that the connection flopped. I agree that this is a strange behavior though it seems to be pretty deliberate in the source code. I asked Flavio Leitner about it and he said he thought it had something to do with STP though I can't imagine why or how. Anyway, agree that we should clarify this with them.

Generated at Wed Feb 07 20:20:31 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.