[OVSDB-406] Proper handling of json rpc transactions involving deletes particularly physical locators Created: 21/Mar/17  Updated: 21/Mar/18  Resolved: 21/Mar/18

Status: Verified
Project: ovsdb
Component/s: Southbound.hw_vtep
Affects Version/s: Oxygen
Fix Version/s: Oxygen

Type: Bug
Reporter: suneel verma Assignee: suneelu varma
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 8029

 Description   

Today physical locator creation is driven from controller , but not deletion.
Physical locator can be deleted from the device database , when the device sees no references to it.

This is posing some problems with subsequent creations.
When mcast mac1 is created along with locator1

The operational datastore contains locator1 and its uuid.

Now if the following updates come.
at time T1)Delete mcast mac1
at time T2)Insert mcast mac2 ( referring to locator1 )

First odl controller will fire delete mcast mac1 to device.
The device acknowledges success for the transaction.

In the background
The device also deletes locator1 as its only reference is gone and sends an update to the controller ( let us say this sending the delete update to controller is delayed by a sec ).

Before that previous event happens if Insert mcast mac2 event comes, controller uses the uuid of locator1 to create mac2.
This insertion of mac2 fails with error saying such a foreign key does not exist.

Controller has no clue that device is going to delete locator1 and went ahead used its uuid to create mac2 which resulted in this error.



 Comments   
Comment by Anil Vishnoi [ 24/Mar/17 ]

Hi Suneel,

Do you have any specific approach to fix it ? How about if controller create the mcast mac2 first and then delete mac1 to avoid this situation? Will device throw error if the mac1 already exist for the physical locator?

Comment by suneel verma [ 27/Mar/17 ]

How about if controller create the mcast mac2 first and then delete mac1 to avoid this situation?

When mac1 delete event came , controller does know that it is going to be followed by insert of mac2 which uses some of the locators of mac1.

It simply acts on the mac1 delete event. That way we cannot put it on hold.

Moreover this situation can happen for other references of locators like , remote ucast macs and tunnels.

Comment by suneel verma [ 27/Mar/17 ]

Two approaches I was thinking.

Approach1.
Simply retry the insert transaction after some configured delay.

Since we put a delay here for retry, by the time it gets rescheduled,
the locator delete event comes from device,
it gets cleared from operational data store.
As part of re-execution of event post the delay it sees that locator is not present and goes ahead and creates one.

Approach2
Maintain a ref count on locator from config side.
When the ref count goes down to zero , put the locator in transit state.

Now the insert of mac2 event comes.
It sees that its locator state is in transit.
When we see our dependency is in transit, it is pushed to dependency queue for all its dependencies to be resolved.

If the device actually deletes the locator , then the locator is removed
from in transit state let us say with in the next 100ms or so.

Then the event (insert mac2 ) which got pushed to queue gets executed since its dependency is resolved from in transit state.
As part of processing this event (insert mac2 ), it goes ahead and creates locator.

If there is a bug in refcounting the locator references ( actually there are references but we thought it is going to be deleted by looking at zero ref count and placed the locator in transit state ),

then the device does not delete the locator

The dependency queue event (insert mac2) ages out (expires) in 30secs (configured time) and proceeds with the original locator uuid to do insertion of mac.

Also discussing the same on patch
https://git.opendaylight.org/gerrit/#/c/53594/

Comment by Anil Vishnoi [ 27/Mar/17 ]

I am not very confident about the delay based approach, so the first approach looks bit risky and probably can introduce some race conditions etc. Second approach is probably the better approach. As per vtep schema, Physical Locator is referenced by Physical_Locator_Set,Ucast_Macs_Local and Ucast_Macs_Remote. It shows that Ucast_Macs_Local, Ucast_Macs_Remote both are having one to one mapping with physical locator, except Physical_Locator_Set. Physical_Locator_Set is something that is just holding the reference. Given that Ucast_Macs_Local, Ucast_Macs_Remote both has one to one mapping with Physical_Locator, i am wondering if we can club the create/delete of Physical_Locator with Ucast_Macs_Local, Ucast_Macs_Remote create/delete operations. So rather than relying on switch to automatically do that, let the controller explictly do it, that way the functional behavior of the plugin will be more deterministic.

Comment by suneel verma [ 28/Mar/17 ]

Ucast mac controlling the creation and deletion of locator makes sense.

Today One more variable in picture is tunnel. That is also creating the locator if it does not exist.

So in the approach2 if we replace mcast mac1 and mcast mac2 as ucast mac1 and ucast mac2 , does it sound ok ?

The only downside of this approach as i was talking is some times it causes unnecessary delays (upto precofigured delay 10-30sec) if the ref counting goes to zero ( if at all a bug is there in ref counting ).

Comment by suneel verma [ 04/Apr/17 ]

I have made the changes to add ref count and verified using the suggested approach.
It works fine except for mcast macs.
After adding refcounts for locators referred by mcast macs along with ucast macs, it works fine.

Comment by suneel verma [ 05/Sep/17 ]

review in progress
https://git.opendaylight.org/gerrit/61535

Generated at Wed Feb 07 20:36:18 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.