[OPNFLWPLUG-156] The Finisher queue size is unbounded and causes the controller to run out of memory under stress Created: 12/May/14 Updated: 27/Sep/21 Resolved: 13/May/14 |
|
| Status: | Resolved |
| Project: | OpenFlowPlugin |
| Component/s: | General |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Jan Medved | Assignee: | Jan Medved |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: Mac OS |
||
| Attachments: |
|
| External issue ID: | 985 |
| Description |
|
The TicketFinisher queue is unbounded (created with integer.MAX_VALUE). Now that the MD-SAL queues do backpresssure,the plugin's Finisher queue will grow indefinitely and cause the controller to run out of memory under stress. To reproduce, run the the cbench throughput test on a controller that uses the IF1.3 plugin, for example: cbench -c 192.168.162.1 -p 6633 -m 1000 -l 10 -s 16 -M 100000 - This bug is similar to The Finisher queue should be created with a max number of elements (say 500 or 1000). The max number of elements in the Finisher queue should be configurable. |
| Comments |
| Comment by Jan Medved [ 12/May/14 ] |
|
Changed the queue initialization to limit the Finished queue 1000 elements in https://git.opendaylight.org/gerrit/6885. Now the Finisher queue is not growing out of bounds anymore, but another out-of-memory condition is manifesting itself somewhere in the parser - see the attached memory/ object dump from the YourKit profiler. |
| Comment by Jan Medved [ 12/May/14 ] |
|
Attachment Class-list.zip has been added with description: class list dump from Yourkit |
| Comment by Jan Medved [ 12/May/14 ] |
|
Showing the objects on the Hash map that did not get garbage collected |
| Comment by Jan Medved [ 12/May/14 ] |
|
Attachment Object-explorer.zip has been added with description: Object Dump from YourKit |
| Comment by Jan Medved [ 12/May/14 ] |
|
(In reply to Jan Medved from comment #0) One more thing: this change allows 3-4 runs of cbench in throughput mode, whereas before the controller locked up after one run. |
| Comment by Robert Varga [ 12/May/14 ] |
|
I have run a simple 'start SP' test, profiled it, it turns out we do have a leak in yangtools – one of the node implementations is holding a reference which it should not. We have pushed a few improvements to memory usage already, expect to lose ~120MB once the fix is in. It can also explain why NETCONF has such sucky scaling numbers. |
| Comment by Robert Varga [ 12/May/14 ] |
|
BUG-987 is tracking the yangtools thing. |
| Comment by Michal Rehak [ 13/May/14 ] |
| Comment by Michal Rehak [ 13/May/14 ] |
|
please verify |