[OPNFLWPLUG-918] Regression: Controller fails to delete 100K flows from switches Created: 03/Jul/17 Updated: 27/Sep/21 |
|
| Status: | Confirmed |
| Project: | OpenFlowPlugin |
| Component/s: | General |
| Affects Version/s: | Nitrogen, Oxygen, Fluorine |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Medium |
| Reporter: | Luis Gomez | Assignee: | Anil Vishnoi |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| Issue Links: |
|
||||||||
| External issue ID: | 8787 | ||||||||
| Description |
|
This issue is happening in Carbon/Nitrogen (not in Boron), it is tracked here: To reproduce: 1) Start mininet linear 32: 2) Push 100K flows (script available in int/test repo): Observe after 100K flows are added and controller is stable CPU is still very high. 3) Remove flows from inventory: Observe after 100K flows are removed and controller is stable, there are still flows in operational that never get removed.
CSIT job for this test : ttps://jenkins.opendaylight.org/releng/job/openflowplugin-csit-1node-periodic-scale-stats-collection-daily-only-carbon/ |
| Comments |
| Comment by Luis Gomez [ 04/Jul/17 ] |
|
Forgot to add the link: |
| Comment by Tomas Slusny [ 12/Jul/17 ] |
|
So it looks like on nitrogen this is already working, and on carbon it will be working after this cherry-pick will be merged: https://git.opendaylight.org/gerrit/#/c/60196/1 |
| Comment by Tomas Slusny [ 12/Jul/17 ] |
|
Oh sorry, wrong 100k flows bug, this was meant for 6755 one. |
| Comment by Luis Gomez [ 31/Jul/17 ] |
|
This issue only happens in Carbon now and it is FRM related because flows are not removed from switch when we delete them in the inventory. |
| Comment by Abhijit Kumbhare [ 28/Aug/17 ] |
|
This is a regression for Carbon (throughout Carbon) - works in Boron and in Nitrogen. |
| Comment by Tomas Slusny [ 13/Sep/17 ] |
|
This is sporadically failing on Nitrogen too. Anyway, here is patch that will resolve this issue: https://git.opendaylight.org/gerrit/#/c/63101/ |
| Comment by Luis Gomez [ 14/Sep/17 ] |
|
Yeah, also if you look at the karaf logs when it fails, there are few switch disconnects after DS is cleared in case of nitrogen and a bunch of disconnects in case of carbon. This means the controller gets so busy with the DS clear operation that it misses the ECHO requests from switches and these start to disconnect and reconnect aggravating the problem. If we do not fix this for nitrogen, i will open a blocker for carbon as the issue is much more apparent in carbon. |
| Comment by Luis Gomez [ 17/Sep/17 ] |
|
BTW, to support my observation I started couple of tests with switch ECHO message disabled, and both passed in carbon and nitrogen: BR/Luis |
| Comment by Tomas Slusny [ 18/Sep/17 ] |
|
This was caused because of heavy load on Netty thread, what was caused probably by single-layer serialization. Explanation: with multi-layer, we are doing conversion on different thread and then serializing simple OFJ data structure. In single-layer, we are doing everything on Netty thread. Overall, it is faster by around 1/3 based on YourKit observations what I did during 100k flow test locally, but all load is on Netty thread, so it disconnects. My patch that I mentioned before solves this by pre-serializing data coming via single-layer and then sending raw bytes to netty thread, what causes almost no load on Netty thread. |
| Comment by Luis Gomez [ 18/Sep/17 ] |
|
Right, you patch works in oxygen: Do you want to cherry-pick to other branches now to test? or later when your patch is merged in master? |
| Comment by Tomas Slusny [ 22/Sep/17 ] |
|
After it will be merged probably. |
| Comment by Tomas Slusny [ 25/Sep/17 ] |
|
Created cherry-pick for stable/nitrogen: https://git.opendaylight.org/gerrit/#/c/63491/ Cherry-pick to stable/carbon will come after that. |
| Comment by Tomas Slusny [ 25/Sep/17 ] |
|
stable/carbon: https://git.opendaylight.org/gerrit/#/c/63496/ |
| Comment by Tomas Slusny [ 26/Sep/17 ] |
|
Alright, updated the patch for stable/carbon, again I forgot to actually load the configuration flag to enable the preserialization. |
| Comment by Sam Hague [ 26/Sep/17 ] |
|
Tomas, does this patch also cover what was in https://git.opendaylight.org/gerrit/#/c/62792/? that patch also had bug-8787 and was abandoned. That abandoned patch was also supposed to fix https://bugs.opendaylight.org/show_bug.cgi?id=7826, but it didn't so also wondering if 7826 is fixed in this patch now? |
| Comment by Tomas Slusny [ 26/Sep/17 ] |
|
No, the flow-related patch tried to improve performance of flow deletion, but it did not (and also caused more issues) so I abandoned it as current method for flow deletion what we are using seems more performant. Also, that patch was never supposed to solve This current patch that solves this issue do not modifies the registry at all, so it will not help the |
| Comment by Luis Gomez [ 27/Sep/17 ] |
|
Last patch seems to work so we can close this bug after it is merged. |
| Comment by Tomas Slusny [ 29/Sep/17 ] |
|
The patches was not working for extensions, so updated them to include real calls to serialization registry. Unfortunately that required changes in OFJ, and for stable/carbon it is stil separate project, so here is additional patch for OFJ for stable/carbon: https://git.opendaylight.org/gerrit/#/c/63843/ |
| Comment by Luis Gomez [ 04/Oct/17 ] |
|
We have this issue in all branches because no patch has been merged yet. |
| Comment by Abhijit Kumbhare [ 16/Oct/17 ] |
|
Moving it to Arun. |
| Comment by Abhijit Kumbhare [ 30/Oct/17 ] |
|
Anil reviewing these set of patches: https://git.opendaylight.org/gerrit/#/q/topic:bug/8787+status:open |
| Comment by Luis Gomez [ 06/Mar/18 ] |
|
From last test results:
|
| Comment by Luis Gomez [ 11/Jun/18 ] |
|
This job started to work after adjusting 2 things:
I think we still have issue of echo messages not responded when controller is busy, so I will leave this bug open for now reducing severity to Major. |
| Comment by Anil Vishnoi [ 11/Jun/18 ] |
|
Ideally this test should with without the above mentioned workaround. This bug is kept open to address this issue without the above workaround. |
| Comment by Anil Vishnoi [ 25/Jun/18 ] |
|
Given that we have a workaround for this issue, reducing the priority. |