[OVSDB-406] Proper handling of json rpc transactions involving deletes particularly physical locators Created: 21/Mar/17 Updated: 21/Mar/18 Resolved: 21/Mar/18 |
|
| Status: | Verified |
| Project: | ovsdb |
| Component/s: | Southbound.hw_vtep |
| Affects Version/s: | Oxygen |
| Fix Version/s: | Oxygen |
| Type: | Bug | ||
| Reporter: | suneel verma | Assignee: | suneelu varma |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| External issue ID: | 8029 |
| Description |
|
Today physical locator creation is driven from controller , but not deletion. This is posing some problems with subsequent creations. The operational datastore contains locator1 and its uuid. Now if the following updates come. First odl controller will fire delete mcast mac1 to device. In the background Before that previous event happens if Insert mcast mac2 event comes, controller uses the uuid of locator1 to create mac2. Controller has no clue that device is going to delete locator1 and went ahead used its uuid to create mac2 which resulted in this error. |
| Comments |
| Comment by Anil Vishnoi [ 24/Mar/17 ] |
|
Hi Suneel, Do you have any specific approach to fix it ? How about if controller create the mcast mac2 first and then delete mac1 to avoid this situation? Will device throw error if the mac1 already exist for the physical locator? |
| Comment by suneel verma [ 27/Mar/17 ] |
|
How about if controller create the mcast mac2 first and then delete mac1 to avoid this situation? When mac1 delete event came , controller does know that it is going to be followed by insert of mac2 which uses some of the locators of mac1. It simply acts on the mac1 delete event. That way we cannot put it on hold. Moreover this situation can happen for other references of locators like , remote ucast macs and tunnels. |
| Comment by suneel verma [ 27/Mar/17 ] |
|
Two approaches I was thinking. Approach1. Since we put a delay here for retry, by the time it gets rescheduled, Approach2 Now the insert of mac2 event comes. If the device actually deletes the locator , then the locator is removed Then the event (insert mac2 ) which got pushed to queue gets executed since its dependency is resolved from in transit state. If there is a bug in refcounting the locator references ( actually there are references but we thought it is going to be deleted by looking at zero ref count and placed the locator in transit state ), then the device does not delete the locator The dependency queue event (insert mac2) ages out (expires) in 30secs (configured time) and proceeds with the original locator uuid to do insertion of mac. Also discussing the same on patch |
| Comment by Anil Vishnoi [ 27/Mar/17 ] |
|
I am not very confident about the delay based approach, so the first approach looks bit risky and probably can introduce some race conditions etc. Second approach is probably the better approach. As per vtep schema, Physical Locator is referenced by Physical_Locator_Set,Ucast_Macs_Local and Ucast_Macs_Remote. It shows that Ucast_Macs_Local, Ucast_Macs_Remote both are having one to one mapping with physical locator, except Physical_Locator_Set. Physical_Locator_Set is something that is just holding the reference. Given that Ucast_Macs_Local, Ucast_Macs_Remote both has one to one mapping with Physical_Locator, i am wondering if we can club the create/delete of Physical_Locator with Ucast_Macs_Local, Ucast_Macs_Remote create/delete operations. So rather than relying on switch to automatically do that, let the controller explictly do it, that way the functional behavior of the plugin will be more deterministic. |
| Comment by suneel verma [ 28/Mar/17 ] |
|
Ucast mac controlling the creation and deletion of locator makes sense. Today One more variable in picture is tunnel. That is also creating the locator if it does not exist. So in the approach2 if we replace mcast mac1 and mcast mac2 as ucast mac1 and ucast mac2 , does it sound ok ? The only downside of this approach as i was talking is some times it causes unnecessary delays (upto precofigured delay 10-30sec) if the ref counting goes to zero ( if at all a bug is there in ref counting ). |
| Comment by suneel verma [ 04/Apr/17 ] |
|
I have made the changes to add ref count and verified using the suggested approach. |
| Comment by suneel verma [ 05/Sep/17 ] |
|
review in progress |