[OPNFLWPLUG-630] Li plugin: Scalability issues with OVS 2.4 Created: 04/Mar/16 Updated: 27/Sep/21 Resolved: 30/Jun/16 |
|
| Status: | Resolved |
| Project: | OpenFlowPlugin |
| Component/s: | General |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Alexis de Talhouët | Assignee: | Alexis de Talhouët |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| External issue ID: | 5464 |
| Description |
|
While performing scalability test against OpenFlowPlugin codebase, stable/lithium branch, -li plugin, with statistics collection enable, I encounter the following issue:
I will setup tests in ODL infra to track progress, and to know better about scalability performance. Work started here: https://git.opendaylight.org/gerrit/#/c/35813/ |
| Comments |
| Comment by Jozef Bacigal [ 16/Mar/16 ] |
|
Alexis please can you explain what exactly means "ODL goes crazy a bit", can you point me for the exact part of the log ? Thank you. Jozef |
| Comment by Alexis de Talhouët [ 16/Mar/16 ] |
|
Jozef, please read this mail I send a while now: https://lists.opendaylight.org/pipermail/openflowplugin-dev/2016-February/004631.html I know the behavior has changed a bit since a lot of code came in to stabilize transaction/role/etc... Also it pretty easy to reproduce, I have create a patch in intergration test with all the Dockerfile, scripts I've been using to create the scale testbed. I shall reproduce it with current codebase by the end of the week if you want to. Thanks, |
| Comment by Luis Gomez [ 16/Mar/16 ] |
|
This bug is actually very easy to reproduce with OVS 2.4: 1) Download latest ODL Be from Nexus >OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00000007b5c00000, 95944704, 0) failed; error='Cannot allocate memory' (errno=12)
|
| Comment by Alexis de Talhouët [ 17/Mar/16 ] |
|
So I analyzed what is going on: OvS has a inactivity probe timeout that by default is 5 second [0]. If in the period of 2 * inactivity probe (10 second by default) the switch doesn't hear from the controller (ECHO_REQUEST, or whatever message in fact), the switch will send a disconnected request. [1]
[0] [1] |
| Comment by Luis Gomez [ 21/Mar/16 ] |
|
Updating bug title to reflect issue only happens in Li plugin. |
| Comment by Jozef Bacigal [ 21/Mar/16 ] |
|
We were talking wint Michal Rehak and there is really a huge amount of data and huge load on DS with table features. With this possibility to switch off table features was running locally on my mininet all smoothly with 127 switches. I would appreciate when someone will test it properly with this patch: https://git.opendaylight.org/gerrit/#/c/36506/ This is anyway a workaround, but we can provide it into SR1. If I am not mistaken, everyone can still use RPC for table feature request. Thanks Jozef |
| Comment by Alexis de Talhouët [ 22/Mar/16 ] |
|
Jozeph, I confirmed this increase the scalability of the OFP-lithium. In fact this is the workaround I did for my app. Would you mind amending this patch so one can define what features he wants? This would be great! Else I can do it, if you agree it makes sense. Thanks, |
| Comment by Jozef Bacigal [ 23/Mar/16 ] |
|
Hi Alexis, I don't think to have "all" features configurable it such a good idea, as you have more possibility something to set or change then you get more wrong settings and more problems and question about how to properly set. I would be wise to let configurable features to hold on minimum. Jozef |
| Comment by Michal Rehak [ 05/Apr/16 ] |
|
Hi all, Here table-features got moved outside table in order not to put additional heavy load on statistic updates. Trade off is that downstream apps would need to adapt their instance-identifiers to table features (I checked NIC and DIDM - both are reading table-features). Positive by this fix is that no exclusive flag in config is needed and projects like NIC/DIDM can still coexist beside other - "table-features free" projects. So please test this change. Thank you. |
| Comment by Luis Gomez [ 05/Apr/16 ] |
|
I like this solution better than the config workaround. One of the reasons is we do not want to ask users to set special configuration to run perf/scale tests, perf/scale should be good out-of-the-box. BR/Luis |
| Comment by Hideyuki Tai [ 20/May/16 ] |
|
Hi all, I've found out that the patch for the In the result, it has broken NIC project's build. I think we should notify downstream projects of any model changes before we merge the patch for model changes. |
| Comment by Jozef Bacigal [ 23/May/16 ] |
|
Hideyuki we were talking about it a lot and second its a long time that nobody from NIC has noticed that NIC is broken Here is link https://lists.opendaylight.org/pipermail/openflowplugin-dev/2016-May/005123.html where we were talked about it, if we wait to long when anyone give us feedback we would still be waiting for someone. PEACE Jozef P.S.: Sorry to cause you problem, but I really though we all agreed to merge the changes onto master. |
| Comment by Hideyuki Tai [ 23/May/16 ] |
|
(In reply to Jozef Bacigal from comment #12) Yeah, it's quite weird that nobody in the NIC project was aware of that build failure for a long time And, I'm sorry I was not aware you guys talked about it a lot before the merge. Thanks! |