[GENIUS-119] Auto tunnels: race between creation of tun port and bridge ref, port not created Created: 09/Apr/18  Updated: 13/Jun/19  Resolved: 13/Jun/19

Status: Verified
Project: genius
Component/s: None
Affects Version/s: None
Fix Version/s: Sodium

Type: Bug Priority: Medium
Reporter: Josh Hershberg Assignee: nidhi adhvaryu
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Blocks
is blocked by GENIUS-122 DataTreeEventCallbackRegistrar really... Resolved

 Description   

What went wrong functionally:

I'm currently working on CSIT to load test auto-tunnels. The test starts 30 docker containers each containing an OVS. Once the containers are running the test quickly set-manager and set-controllers the OVS instances. Test results show that some of the OVS nodes do not get all tunnel ports.

Why?

The reason for this is a race condition where genius attempts to create a tunnel port on the br-int of a node before the BridgeRedEntry is created for that br-int. As such the following code in OvsInterfaceConfigAddHelper#addTunnelConfiguration never enters the if block and the port is never created:

        LOG.debug("creating bridge interfaceEntry in ConfigDS {}", dpId);
        interfaceMetaUtils.createBridgeInterfaceEntryInConfigDS(dpId, interfaceNew.getName());

        // create bridge on switch, if switch is connectedBridgeRefEntry bridgeRefEntry = interfaceMetaUtils.getBridgeRefEntryFromOperDS(dpId);
        if (bridgeRefEntry != null && bridgeRefEntry.getBridgeReference() != null) {
            LOG.debug("creating bridge interface on dpn {}", dpId);
            InstanceIdentifier<OvsdbBridgeAugmentation> bridgeIid =
                    (InstanceIdentifier<OvsdbBridgeAugmentation>) bridgeRefEntry
                    .getBridgeReference().getValue();
            if (createTunnelPort) {
                southboundUtils.addPortToBridge(bridgeIid, interfaceNew, tunnelName);
            }
...

 

This is evident from the times and line numbers on the following log statements. First we have:

36516 2018-04-08T12:38:16,674 | INFO  | ForkJoinPool-1-worker-3 | OvsInterfaceConfigAddHelper      | 328 - org.opendaylight.genius.interfacemanager-impl - 0.4.1.SNAPSHOT | a      dding tunnel configuration for interface tuneebd1154715
36517 2018-04-08T12:38:16,674 | DEBUG | ForkJoinPool-1-worker-3 | OvsInterfaceConfigAddHelper      | 328 - org.opendaylight.genius.interfacemanager-impl - 0.4.1.SNAPSHOT | c      reating bridge interfaceEntry in ConfigDS 77216042541384

 The next line should have been "creating bridge interface on dpn..." (from the code above) but that line never comes because the BridgeRefEntry is only created here:

37304 2018-04-08T12:38:17,159 | DEBUG | ForkJoinPool-1-worker-3 | InterfaceMetaUtils               | 328 - org.opendaylight.genius.interfacemanager-impl - 0.4.1.SNAPSHOT | C      reating bridge ref entry for dpn: 77216042541384 bridge: InstanceIdentifier{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.params.xml.ns.yang.ovsdb      .rev150105.OvsdbBridgeAugmentation, path=[org.opendaylight.yang.gen.v1.urn.tbd.params.xml.ns.yang.network.topology.rev131021.NetworkTopology, org.opendaylight.yang.gen      .v1.urn.tbd.params.xml.ns.yang.network.topology.rev131021.network.topology.Topology[key=TopologyKey [_topologyId=Uri [_value=ovsdb:1]]], org.opendaylight.yang.gen.v1.u      rn.tbd.params.xml.ns.yang.network.topology.rev131021.network.topology.topology.Node[key=NodeKey [_nodeId=Uri [_value=ovsdb://uuid/9630ada5-c950-49ec-8599-a53e776ab729/      bridge/br-int]]], org.opendaylight.yang.gen.v1.urn.opendaylight.params.xml.ns.yang.ovsdb.rev150105.OvsdbBridgeAugmentation]}


 Comments   
Comment by Faseela K [ 09/Apr/18 ]

jhershbe : There is InterfaceTopologyStateListener which fetches all tunnel interfaces from DS and program them on the switch. Are you saying this also did not work? Because it ended up in race with tunnel-interface creation thread?

Comment by Faseela K [ 10/Apr/18 ]

jhershbe : I can see two approaches to solve this :

 

  1. Make ITM auto-tunnels act upon odl-interface-meta/bridge-ref-info using DataTreeEventCallbackRegistrar utility. OR
  2. Enhance odl-interface-meta/bridge-ref-info yang model to add additional attributes (like local-ip) which is needed for auto-tunnels to operate, and then remove the network-topology listener in ITM, and switch to odl-interface-meta/bridge-ref-info listener to start ITM auto-tunnel configuration.

Suggestions?

Comment by Josh Hershberg [ 10/Apr/18 ]

k.faseela:

There is InterfaceTopologyStateListener which fetches all tunnel interfaces from DS and program them on the switch. Are you saying this also did not work? Because it ended up in race with tunnel-interface creation thread?

I did not delve into the details but, no, it did not work.

Comment by Josh Hershberg [ 10/Apr/18 ]

k.faseela, regarding the two approaches you propose, I'm thinking that since (1) is SO much simpler (much smaller change) and safer it's probably worth giving it a try. I figure since I have that CSIT job I'm working on I can implement (1) and then do some scale tests with and without it to make sure the extra registrations do not cause problems with scale. I don't expect it to be an issue since from 30-40 nodes this race conditions happens just 1-2 times. Sound good? 

Comment by Faseela K [ 10/Apr/18 ]

jhershbe : I remember vorburger initially indicating that this EventCallBackRegistrar has some performance issues, which we will fix in subsequent patches, not sure whether those patches actually came in. Please keep in mind that for Oxygen, Ericsson's requirements is to support 150 switches, and we do have plans to use auto-tunnels.

Comment by Michael Vorburger [ 10/Apr/18 ]

k.faseela I never said that EventCallBackRegistrar has known some performance issues, just that if there are ever reasons that prove that its current implementation is a performance bottleneck, it could possibly be optimized; see also the in-line comment here.

Comment by Faseela K [ 11/Apr/18 ]

Thanks vorburger ! Josh is proceeding with option1.

Comment by Josh Hershberg [ 12/Apr/18 ]

Noted, that's why I want to test it  Incidentally, to the best of my understanding it is unclear whether or not there is a performance problem with the registrar. There are some optimizations that can be done but as of yet it's unclear whether or not they're necessary.

I've been thinking more about this specific bug and fix and I think it is the perfect case for the registrar. The reason is that here we have a kind of corner case and we know that the "happy path" works 99.99% of the time. As such, it would be a shame to rewrite things for this corner case and we have a relatively clear and simply solution in the form of the regsitrar. Just my two cents. Anyway, I plan on checking the performance viability of it.

Comment by Josh Hershberg [ 12/Apr/18 ]

Hah, just saw this and said the same thing a second ago. Sorry for the redundancy.

Comment by Michael Vorburger [ 19/Apr/18 ]

GENIUS-122 is providing the new utility discussed above...

Comment by Faseela K [ 22/May/18 ]

Josh, any update on this? All the required utils are already merged. Were you able to use them?

Comment by Michael Vorburger [ 24/Aug/18 ]

Housekeeping: Looks like this issue didn't make it into Fluorine, thus Fix Version Neon.

Comment by nidhi adhvaryu [ 13/Jun/19 ]

There is a timing issue between SB and NB event while creating bridge. When we are trying to add bridge to ovsdb from SB, NB is trying to add port for same bridge. during which NB will check DS/cache for bridge entry, which will be empty as SB is trying to add it in ovsdb at the same time. So we are adding eventcallback to synchronize both events.

 

The fix for this issue is merged here, https://git.opendaylight.org/gerrit/#/c/81980/

Comment by nidhi adhvaryu [ 13/Jun/19 ]

CSIT has passed on the fix. and it is locally tested.

https://git.opendaylight.org/gerrit/#/c/81980/

Generated at Wed Feb 07 19:59:57 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.