[OPNFLWPLUG-217] 24hr performance degradation Created: 18/Jul/14 Updated: 27/Sep/21 Resolved: 17/Aug/16 |
|
| Status: | Resolved |
| Project: | OpenFlowPlugin |
| Component/s: | General |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Christopher O'Shea | Assignee: | Marcus G K Williams |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: Linux |
||
| Attachments: |
|
| External issue ID: | 1395 |
| Description |
|
Following step in the Wiki https://wiki.opendaylight.org/view/CrossProject:Integration_Group:Performance_Test With "dropAllPacketsRPC on" there is a noticeable performance degradation. Ever 30 minutes cbench is ran (cbench -c $controllerip -p 6633 -m 10000 -l 10 -s 256 -M 100000 -w 1). (Thu Jul 17 07:53:43 PDT 2014)30 minutes after starting controller min/max/avg is 21434.08/22523.66/21841.65 See CSV file for detail report. |
| Comments |
| Comment by Christopher O'Shea [ 18/Jul/14 ] |
|
Attachment OpenFlow performance.png has been added with description: Visual plot of performance. |
| Comment by Christopher O'Shea [ 18/Jul/14 ] |
|
Attachment results-18-6-2014.csv has been added with description: CSV break down of memory and CPU stats |
| Comment by Daniel Farrell [ 06/Aug/14 ] |
|
Here's another example of this observed performance degradation[1]. Note that (perhaps obviously) restarting ODL between CBench runs mitigates this decrease[2]. [1]: https://github.com/dfarrell07/cbench_regression/issues/13#issuecomment-47366340 |
| Comment by Christopher O'Shea [ 10/Sep/14 ] |
|
Switch to using Karaf controller RC0
First set of test run fine. |
| Comment by Christopher O'Shea [ 16/Sep/14 ] |
|
I had the controller running for 24hour, but wasn't running Cbench ever 30 minutes. Look like the first run after getting stabled was around 38k, didnn't change when i ran it again 24hr later. Will setting to run ever 30 minutes tonight and update ticket tomorrow. |
| Comment by Daniel Farrell [ 19/Nov/14 ] |
|
I'm still seeing perf degradation over time in Helium 0.2.1. Lots of details on this WCBench wiki page[1]. [1]: https://github.com/dfarrell07/wcbench/wiki/Helium-0.2.1-Performance-Results#results-0 |
| Comment by Daniel Farrell [ 10/Feb/15 ] |
|
The perf results wiki I linked to previously has changed addresses. The updated URLs (including much more detailed results) are: https://github.com/dfarrell07/wcbench/wiki/Unusual-Helium-0.2.1-Performance-Results |
| Comment by Marcus G K Williams [ 23/Feb/15 ] |
|
Reproducing this issue. I'll work on a fix after reproduction. |
| Comment by Marcus G K Williams [ 18/Mar/15 ] |
|
Graph of |
| Comment by Marcus G K Williams [ 18/Mar/15 ] |
|
Attachment OF_1_Bug1395.PNG has been added with description: Graph of CBench Results showing |
| Comment by Marcus G K Williams [ 18/Mar/15 ] |
|
I've reproduced this many times. I've attached a graph of the results I'm seeing on Helium SR1, though I've also reproduce on latest integration nightly build. Currently java profiling ODL during bug reproduction in an attempt to narrow likely causes and focus on correct area of the code. Initial data based on profiling show ever increasing nioEventLoopGroup threads after each cbench run. More analysis is needed before this can be definitively named as the root cause. Source of nio thread generation is under investigation and currently being traced back to code. |
| Comment by Jamo Luhrsen [ 20/Jul/15 ] |
|
I've retested this with the Lithium release and the issue is still there. |
| Comment by Jamo Luhrsen [ 20/Jul/15 ] |
|
Attachment 1395_Lithium_data.ods has been added with description: re-test with Lithium |
| Comment by Sania Zara [ 06/Oct/15 ] |
|
Submitted a patch for review https://git.opendaylight.org/gerrit/#/c/27867/ |
| Comment by Jamo Luhrsen [ 12/Oct/15 ] |
|
Attachment 1395.odc has been added with description: degradation chart with potential fix |
| Comment by Jamo Luhrsen [ 12/Oct/15 ] |
|
(In reply to Jamo Luhrsen from comment #13) The attached chart is the data I obtained when testing with Sania's fix. |
| Comment by Daniel Farrell [ 13/Oct/15 ] |
|
> The attached chart is the data I obtained when testing with Sania's fix. Am I reading this correctly - the summery is that perf is still degrading in a similar way to the initial reports? |
| Comment by Jamo Luhrsen [ 13/Oct/15 ] |
|
(In reply to Daniel Farrell from comment #15) that's correct. |
| Comment by Muhammad Umar Hameed [ 26/Oct/15 ] |
|
I have tested Sania's patch and its working fine with no degradation. |
| Comment by Muhammad Umar Hameed [ 26/Oct/15 ] |
|
Attachment 1395_fix.csv has been added with description: 1395fix |
| Comment by Daniel Farrell [ 27/Oct/15 ] |
|
(In reply to Muhammad Umar Hameed from comment #17) Your standard deviation is quite high, which could be masking a trend. Still, interesting result, thanks. Graph attached. /wcbench$ ./stats.py -s flows 12:12:53 , |
| Comment by Daniel Farrell [ 27/Oct/15 ] |
|
Attachment Graph of 1395fix CSV file flows.png has been added with description: Graph of 1395fix CSV file created using WCBench |
| Comment by Muhammad Umar Hameed [ 31/Oct/15 ] |
|
Anyone else tested with the same patch? |
| Comment by Daniel Farrell [ 02/Nov/15 ] |
|
(In reply to Muhammad Umar Hameed from comment #17) Jamo also verified the fix and the patch has now been merged. https://git.opendaylight.org/gerrit/#/c/27867/ We may be able to close the longest-standing known perf bug in ODL. |
| Comment by Abhijit Kumbhare [ 09/Nov/15 ] |
|
Fixed by Sania's patch: https://git.opendaylight.org/gerrit/#/c/27867/ |
| Comment by A H [ 23/Nov/15 ] |
|
Should we reopen this bug since it was reverted for stable/lithium in the patch below? |
| Comment by Luis Gomez [ 23/Nov/15 ] |
|
I just reopened it. |
| Comment by Sania Zara [ 03/Dec/15 ] |
|
We employed black-box type analysis to locate the source of this performance issue with a minimum understanding of ODL code. So the solution we provided doesn't work well and I can now see why. |
| Comment by Colin Dixon [ 19/Jan/16 ] |
|
Is there a plan for how to incorporate the fact that we know the root cause of the bug into an actual fix? Also, could we please explain here why the fix was reverted? |
| Comment by Jamo Luhrsen [ 19/Jan/16 ] |
|
(In reply to Colin Dixon from comment #26) once this fix was given, a new bug was uncovered: it was decided to fix 4649 above by reverting the patch that fixed this 1395 |
| Comment by Jozef Bacigal [ 17/Aug/16 ] |
|
Closing this bug. Actually this bug was on performance degradation in He design. In Boron/Carbon we don't use it anymore. If any project or someone need to fix this problem for Beryllium SR4 Helium design should raise a new bug. But I doubt there is time for such thing, we got a lot of work with new releases |