[OPNFLWPLUG-382] [Lithium redesign] OOM errors, CPU 100% with 128 connected switches Created: 18/Mar/15 Updated: 27/Sep/21 Resolved: 08/May/15 |
|
| Status: | Resolved |
| Project: | OpenFlowPlugin |
| Component/s: | General |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | SANDEEP GANGADHARAN | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: Linux |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| External issue ID: | 2869 | ||||||||
| Description |
|
1) Run karaf 3) Using mininet connect 300 switches in linear topology =============================================================== =============================================================== |
| Comments |
| Comment by SANDEEP GANGADHARAN [ 18/Mar/15 ] |
|
Attachment log.zip has been added with description: Attaching the logs |
| Comment by Jamo Luhrsen [ 17/Apr/15 ] |
|
Repeated this type of test with the lithium redesign (i.e., odl-openflowplugin-app-new-lldp-speaker feature). with a linear mininet topology using 128 switches, the controller will eventually (takes aprox 30m) give not sure how relevant these messages are, but just a snippet from the karaf console: Exception in thread "Thread-3042" java.lang.OutOfMemoryError: GC overhead limit exceeded |
| Comment by Robert Varga [ 20/Apr/15 ] |
|
Can you take a memory dump (-XX:+HeapDumpOnOutOfMemoryError for example), as this looks like an unbounded queue or a memory leak somewhere. |
| Comment by Evan Zeller [ 20/Apr/15 ] |
|
My JVM was crippled long before I saw OutOfMemory so I did a dev:dump-create as fast as I could. Here's what I see in visualvm. |
| Comment by Evan Zeller [ 20/Apr/15 ] |
|
Attachment dump.png has been added with description: heapdump |
| Comment by Evan Zeller [ 20/Apr/15 ] |
|
Here's another dump, this time I made sure to run mininet with --switch ovsk,protocols=OpenFlow13 |
| Comment by Evan Zeller [ 20/Apr/15 ] |
|
Attachment dump2.png has been added with description: heapdump_of13 |
| Comment by Jamo Luhrsen [ 21/Apr/15 ] |
|
Attachment openflow_multipartRequest_memoryLeak.jpg has been added with description: jvisual screenshot |
| Comment by Jamo Luhrsen [ 21/Apr/15 ] |
|
As Evan has already pointed out, and I have confirmed in my setup, it looks like the memory leak is around this: org.opendaylight.yang.gen.v1.urn.opendaylight.openflow.protocol.rev130731.multipart.reply.multipart.reply.body.multipart.reply.table._case.multipart.reply.table.TableStatsBuilder$TableStatsImpl The overall symptom is that memory is slowly consumed in my 100 switch topology until it hits the limit (3G in my case). multipart request/reply |
| Comment by Abhijit Kumbhare [ 21/Apr/15 ] |
|
Added Hema if she has any knowledge of this table features code mentioned by Jamo in the previous comment as the source of the memory leak? |
| Comment by Jamo Luhrsen [ 07/May/15 ] |
|
does not appear to be resolved, at least as far as CI is concerned: this was using this specific distro: |
| Comment by Jamo Luhrsen [ 08/May/15 ] |
|
I left this open because of a CI scale test that was turning up an OOM Exception, but now it's been determined that the root cause in that scale test is from |