[OPNFLWPLUG-870] missing method to use barrier when installing dependant groups+flows or flows+packet_out Created: 20/Mar/17  Updated: 27/Sep/21  Resolved: 16/Apr/18

Status: Resolved
Project: OpenFlowPlugin
Component/s: General
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Highest
Reporter: Jamo Luhrsen Assignee: Hema Gopalakrishnan
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issue Links:
Duplicate
duplicates OPNFLWPLUG-991 FRM: Flow Group Dependency support Resolved
External issue ID: 8025

 Description   

origin email request below, but entire thread can be found here:

https://lists.opendaylight.org/pipermail/openflowplugin-dev/2017-March/006764.html

-----------------------------------------------------------------------------

Hi,

We have a CSIT test in netvirt that tests UDP connectivity using netvirt's SNAT feature, that's sporadically failing.

The same test in TCP succeeds all the time.

We debugged the flows and it seems there is a race between these three events - ODL receives the initial packet, and then:

a. Installs an inbound flow using NaptEventHandler#buildAndInstallNatFlows()

b. Installs an outbound flow using NaptEventHandler#buildAndInstallNatFlows()

c. Sends the original packet as a packet out to OFPP_TABLE for re-processing by pipeline.

For the test to work properly, (c) must happen after (a) and (b) have been programmed to the switch properly.

The flows are written using genius' mdsalManager.syncInstallFlow(), which does a synchronous write into the flows CONFIG data store.

The packet out is sent via openflowplugin PacketProcessingService.transmitPacket()

Is there a way to ensure (c) is triggered only after (a) and (b) are properly configured?

Perhaps delay the packet out somehow? Use barrier somehow?

If this is not a possibility, we can try two things that are pretty ugly:

a. Reverse (a) and (b) - because then if only (a) is installed (which seems more common), the request would have to be re-punted to the ODL, delaying it. [1] will do that, but we need to run it many times to verify it helps.

This still won't fix anything if (c) happens before both (a) and (b).

b. Leave the entire bug, but try to fix this on a test level - add delay to the server's response or use something other than netcat that might retry UDP.

The reason this might be ok, is that we have flow based SNAT coming up in Carbon which should eliminate this race.

[1] https://git.opendaylight.org/gerrit/#/c/52380

--alon

-----------------------------------------------------------------------------



 Comments   
Comment by Luis Gomez [ 20/Mar/17 ]

I hope we have a way to guarantee order when things are sent to the same switch, otherwise this seems like a required enhancement.

Comment by Tomas Slusny [ 21/Mar/17 ]

So, your problem is basically that you are trying to achieve some sort of synchronization when writing to config datastore and then sending RPC. This is not correct.

If you want to maintain order between these operations, you can use only config datastore (and FRS), but as you are sending packet out, this is not possible, so another option is to use only RPCs and chain resulting futures, or implement DataTreeChangeListener on operational datastore and wait till statistics write changes in config to operational and then send RPC sending packet out.

1st option example:
1. Call SalFlowService#addFlow with inbound flow as input and store returned future
2. Transform this future with asyncFunction and return another call to SalFlowService#addFlow with outbound flow as input and store returned future
3. Transform this future with asyncFunction and return call to PacketProcessingService#transmitPacket (assuming that this method returns future, but I am not really sure)

These steps guarantees that these operations will be sent to device in specified order, but downside is that nothing will be written to config datastore.

2nd option example:
Create implementation of DataTreeChangeListener (or better ClusterDataTreeChangeListener) on operational datastore. Then check if your flows was succesfully written to operational and only then call PacketProcessingService#transmitPacket.

I cannot imagine any other option, because when writing to config and then sending RPC, nothing can guarantee that RPC will not be send before FRM/FRS will start syncing flows with device and operational DS, so not even correct sending of barrier will help here.

Comment by Alon Kochba [ 03/Apr/17 ]

(In reply to Tomas Slusny from comment #2)
> So, your problem is basically that you are trying to achieve some sort of
> synchronization when writing to config datastore and then sending RPC. This
> is not correct.
>
> If you want to maintain order between these operations, you can use only
> config datastore (and FRS), but as you are sending packet out, this is not
> possible, so another option is to use only RPCs and chain resulting futures,
> or implement DataTreeChangeListener on operational datastore and wait till
> statistics write changes in config to operational and then send RPC sending
> packet out.
>
> 1st option example:
> 1. Call SalFlowService#addFlow with inbound flow as input and store
> returned future
> 2. Transform this future with asyncFunction and return another call to
> SalFlowService#addFlow with outbound flow as input and store returned future
> 3. Transform this future with asyncFunction and return call to
> PacketProcessingService#transmitPacket (assuming that this method returns
> future, but I am not really sure)
>
> These steps guarantees that these operations will be sent to device in
> specified order, but downside is that nothing will be written to config
> datastore.
>
> 2nd option example:
> Create implementation of DataTreeChangeListener (or better
> ClusterDataTreeChangeListener) on operational datastore. Then check if your
> flows was succesfully written to operational and only then call
> PacketProcessingService#transmitPacket.
>
> I cannot imagine any other option, because when writing to config and then
> sending RPC, nothing can guarantee that RPC will not be send before FRM/FRS
> will start syncing flows with device and operational DS, so not even correct
> sending of barrier will help here.

Tomas, thanks for the answer.
It seems you only referred to the flow_mod+packet_out dependency.

How about group_mod+flow_mod for FRM?
These both go through DS, and can be synchronized, Anil mentioned this is a bug if it doesn't work.
We cannot move to FRS at this time just for this, as it has many other side effects. Maybe in Nitrogen but we need some fix for Carbon.

Comment by Sam Hague [ 15/May/17 ]

Anil, we added a hack in NetVirt to get around this issue that uses barriers to force syncing the order.

https://git.opendaylight.org/gerrit/#/c/54349/

Comment by A H [ 15/May/17 ]

Patch https://git.opendaylight.org/gerrit/#/c/56786/ was successfully merged. Can we mark this bug as fixed and resolved?

Comment by Anil Vishnoi [ 15/May/17 ]

@Sam, cool, i believe that's the only way to resolve the packet_out + flow ordering issue. Alternate approach is to write flow to data store, wait for it to show up in the operational data store and then send out the packet_out, which probably will introduce unacceptable latency.

There was another issue of group+flow ordering that is something we can address in FRM. I will use this bug to track that. Given that your issue is has workaround solution, i am reducing the critical to critical. If you think group+flow issue is blocking, please change the priority to blocker.

Comment by Abhijit Kumbhare [ 16/Oct/17 ]

Moved to Hema from Anil.

Comment by Abhijit Kumbhare [ 27/Nov/17 ]

Any luck on this Hema?

Comment by Anil Vishnoi [ 12/Dec/17 ]

Hematg Please let me know if you are planning to work on this bug, else i will more it unassigned pool.

Comment by Anil Vishnoi [ 06/Feb/18 ]

Hematg any update on this ?

Comment by Anil Vishnoi [ 06/Mar/18 ]

Arunprakash is currently working on fixing FRM, so that flow and groups can be program as per their inter-dependency. Once that is implemented i believe this enhancement is not required.

Comment by Anil Vishnoi [ 06/Mar/18 ]

Arunprakash please link the Jira ticket for FRM changes to this ticket.

Comment by Anil Vishnoi [ 16/Apr/18 ]

Above bug will be fixed through OPNFLWPLUG-991. Please see 991 for update on this issue.

Generated at Wed Feb 07 20:33:36 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.