[SFC-167] RSP is not found Created: 26/Oct/16  Updated: 19/Oct/17  Resolved: 03/Nov/16

Status: Verified
Project: sfc
Component/s: General
Affects Version/s: unspecified
Fix Version/s: None

Type: Bug
Reporter: Manuel Buil Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issue Links:
Duplicate
is duplicated by NETVIRT-110 Missing flows in OPNFV SFC testcase (... Resolved
External issue ID: 7039

 Description   

We are having a blocking issue in a OPNFV SFC test case with the classifier (we are using old netvirt “odl-ovsdb-openstack” and the official SR0 of Boron). We don’t understand why but the first time we create a classification rule, no classification flows appear in the pipeline (table=11 only has the default flow pointing to table=21). However, if we delete that rule and create a new one, then it works.

These are the logs we get in karaf.log the first time we try (not working):

http://pastebin.com/akTeSrz6

These are the logs we get in karaf.log the second time we try (after deleting the previous classifier rule. This works!):

http://pastebin.com/P8LQwUiN

As you can observe, both fail to get a rsp when trying getRenderedServicePath but the second time we try, it seems as if it tries again to get the rsp and gets it. Note, that before the creation of the classifier was triggered, I queried ODL to check if the RSP was there and yes, it was there. Here are two screenshots with the stack at the moment when it tries to read the rsp from md-sal:

Not working: https://www.dropbox.com/s/38dy03nnxiclbv3/capture%20-%20not%20working.PNG?dl=0

Working: https://www.dropbox.com/s/9y8uhv5u96xyrb8/Capture%20-%20working%20%281%29.PNG?dl=0

That is line 71 of SfcUtils.java:

return mdsalUtils.read(LogicalDatastoreType.OPERATIONAL, getRspId(rspName));

The “capture – not working” shows the stack when it will return rsp = null. The “capture working” shows the stack when it returns a correct rsp and consequently, correct flows rules are written in the pipeline. Observe that when it works, the call was triggered from “removeClassifierRules” and when it does not work, the call is triggered by “addClassifierRules”. In other words, the log output rsp=null is a result of calling the method “addClassifierRules”, whereas the log output where rsp is not null is a result of calling the method “removeClassifierRules”. As you can see in the second logs, I don’t understand why but when we create the second classifier, both methods are called, first “addClassifierRules” and afterwards “removeClassifierRules”.

I also noticed that when I remove a classifier rule, it does not get deleted from the pipeline unless I create a classifier rule. So it is as if the deletion task is pending until a creation task appears and then, the creation task checks if there are any deletion tasks to be done.

So, my hypothesis is: the code only gets the right rsp when removing the classifier rules. This removal action is triggered when calling the creation of rules and then it removes the rules which must be removed and adds the appropriate rules. When we try the first time, as there is nothing to be deleted, it does not work because only the method “addClassifierRules” is called and that one, for unkown reasons to me, fails to get the rsp.



 Comments   
Comment by Sam Hague [ 26/Oct/16 ]

Recap of what is in the thread...

There has to be an rsp in the mdsal at the time the acl is created before netvirtsfc can add flows. In the logs captured that rsp does not exist. Sometime later an rsp is there and then flows are programmed. The adding of the rsp is out of scope from netvirtsfc.

please attach full logs with trace enabled in both the sfc and netvirt projects. From that we can see the full flow and why the rsp is not written to mdsal.

Comment by Brady Johnson [ 28/Oct/16 ]

We did some more testing with more debug logs and found the problem.

The rsp (rendered service path, the actual service chain) is being created correctly, but we see it gets deleted 1 second later, before the netvirt classifier is created, thus the log message by netvirt that the rsp doesn't exist.

We investigated further to see why/who deletes the rsp. We figured out that tacker creates everything correctly for the first rsp and then starts creating a second rsp. When tacker is creating the second rsp, it modifies the sff (service function forwarder), which currently causes SFC to delete the rsp.

If the sff is deleted or if certain fields are modified, the rsp should be deleted. But in this case, the sff mods are minimal and the rsp should not be deleted. So the fix to sfc is when an sff is modified, only delete the rsp when absolutely necessary.

Comment by Brady Johnson [ 28/Oct/16 ]

This is fixed in stable/boron here:

https://git.opendaylight.org/gerrit/#/c/47739/

Its currently being tested in OPNFV. Upon completion there, we'll merge the patch and close this bug.

Brady

Comment by A H [ 28/Oct/16 ]

A patch was submitted to fix this bug in Boron SR1: https://git.opendaylight.org/gerrit/#/c/47739/

To better assess the impact of this bug and fix, could someone from your team please help us identify the following:
Regression: Is this bug a regression of functionality/performance/feature compared to Boron?
Severity: Could you elaborate on the severity of this bug? Is this a BLOCKER such that we cannot release Boron-SR1 without it? Is there a workaround such that we can write a release note?
Testing: Could you also elaborate on the testing of this patch? How extensively has this patch been tested? Is it covered by any unit tests or system tests?
Impact: Does this fix impact any dependent projects?

Comment by Brady Johnson [ 28/Oct/16 ]

Regression:
Is this bug a regression of functionality/performance/feature compared to Boron?

No it is not, its been a failure for some time, and we just found it in OPNFV.

Severity:
Could you elaborate on the severity of this bug? Is this a BLOCKER such that we cannot release Boron-SR1 without it? Is there a workaround such that we can write a release note?

This is quite severe. With just a simple modification to a Service Function Forwarder (SFF) or Service Function (SF) all related Rendered Service Paths (RSPs, the actual service chains) are deleted. The RSP should only be deleted if either the SFF is deleted, the SF is deleted, or either is modified in such a way that the RSP is affected.

Testing:
Could you also elaborate on the testing of this patch? How extensively has this patch been tested? Is it covered by any unit tests or system tests?

Unit Tests were modified and added to ODL to test this patch. I also created a distro to test this patch in OPNFV, which is quite extensive.

Impact:
Does this fix impact any dependent projects?

No

Comment by Brady Johnson [ 28/Oct/16 ]

This has been successfully tested in OPNFV. Closing.

Comment by A H [ 01/Nov/16 ]

Has this bug been verified as fixed in the latest Boron SR1 Build 20161030?

Generated at Wed Feb 07 20:38:49 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.