[OPNFLWPLUG-322] Need to close the ODL Denial of Service interface Created: 24/Nov/14 Updated: 27/Sep/21 Resolved: 13/Oct/15 |
|
| Status: | Resolved |
| Project: | OpenFlowPlugin |
| Component/s: | General |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Jim West | Assignee: | Michal Rehak |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| External issue ID: | 2429 |
| Description |
|
In some of my testing I've discovered an external message and code path that will cause ODL to fail. If left unfixed, I am concerned that this could be used for a Denial of Service attack Internally, we us 'monit' to ensure all core processes are up. The following line was added last month to our monit configuration: This line causes monit to connect to ODL (via the loop-back interface) and then disconnect. After this line was added, we discovered that our ODL installations would fail after a few days of running. I opened https://bugs.opendaylight.org/show_bug.cgi?id=2394 to track a specific problem associated with this (OFHandshake threads building up). When an external entity opens an OpenFlow connection, but does not complete the handshake, I see log messages along the lines of: 2014-11-24 15:17:42.200 UTC [OFHandshake-7-0] WARN o.o.o.o.m.c.ErrorHandlerSimpleImpl - exception -> FIRST HELLO sending failed because of connection issue., session -> null Resources are not freed and ODL eventually locks up. I've patched by code with my patch for 2394, but I am concerned that there are other code paths that I haven't investigated. |
| Comments |
| Comment by Abhijit Kumbhare [ 24/Feb/15 ] |
|
Jim, Do you have a patch for this? In case not then we will need to assign this bug to someone. Thanks, |
| Comment by Jim West [ 24/Feb/15 ] |
|
Hi Abhijit I have a patch associated with 2394. With this patch, I don't see the problem any longer. I'm not sure that the patch for 2394 covers all the code patches (that's the reason I opened this bug) (In reply to Abhijit Kumbhare from comment #1) |
| Comment by Abhijit Kumbhare [ 06/May/15 ] |
|
Can we lower this from critical to major or normal? |
| Comment by Abhijit Kumbhare [ 06/May/15 ] |
|
This needs retest - as handshake thread model was changed in Li codebase & this is to do with close and cleanup after handshake failure. |
| Comment by Jim West [ 06/May/15 ] |
|
What's the reasoning for lowering the Importance? Is it because we think it's fixed? (In reply to Abhijit Kumbhare from comment #3) |
| Comment by Abhijit Kumbhare [ 06/May/15 ] |
|
(In reply to Jim West from comment #5) Reason: you no longer see the issue with your patch - and the developers may need to concentrate on the other higher priority bugs at this moment (during the bug fix cycle). Any case - we need a retest of this (see my previous comment) - can you please retest? |
| Comment by Jamo Luhrsen [ 07/May/15 ] |
|
This is an easy issue to reproduce with the "nc" linux tool. steps: with a profiler (e.g. jvisual) you can watch the thread count rise, or this linux CLI will work too: watch -n1 ps -o nlwp $ODL_PID I've recreated the thread count growth on the Helium release bits, as well as using this distro: around aprox 30k working threads, there may be some trouble with OOM or file descriptors. In every case, the controller proves to be useless. |
| Comment by Abhijit Kumbhare [ 09/May/15 ] |
|
We should not reduce the priority of this. |
| Comment by Jamo Luhrsen [ 11/May/15 ] |
|
This issue is NOT seen in Helium SR3, but is seen in all Helium releases prior to it. I have manually checked this on Helium, Helium-SR1 and Helium-SR2. |
| Comment by Anil Vishnoi [ 11/May/15 ] |
|
Yes, this issue was fixed in helium SR3, so this is expected. It means this is regression from the master branch. |
| Comment by Jozef Gloncak [ 12/May/15 ] |
|
This patch should fix problem in Li codebase |
| Comment by Michal Rehak [ 14/May/15 ] |
|
Please retest ans eventually mark resolved. |
| Comment by Jamo Luhrsen [ 14/May/15 ] |
|
This is resolved, and automation is there to show this. Search the |
| Comment by Peter Gubka [ 25/Jun/15 ] |
|
the test created for this bug start failing for lithium redesigned openflow plugin in stable/lithium |
| Comment by Michal Rehak [ 25/Jun/15 ] |
| Comment by Jamo Luhrsen [ 26/Jun/15 ] |
|
I've fixed the test now: https://git.opendaylight.org/gerrit/#/c/23285/ Lithium (the release distro) does not fail this test case, and I was able to use the sandbox marking this RESOLVED/FIXED. |
| Comment by Jamo Luhrsen [ 26/Jun/15 ] |
|
I've fixed the test now: https://git.opendaylight.org/gerrit/#/c/23285/ Lithium (the release distro) does not fail this test case, and I was able to use the sandbox marking this RESOLVED/FIXED. |
| Comment by Jamo Luhrsen [ 25/Sep/15 ] |
|
This issue still seems to be there. It's not showing up in CI for some reason, I've reproduced this with the robot suite locally. It has failed with both |
| Comment by Luis Gomez [ 25/Sep/15 ] |
|
Very good Jamo |
| Comment by Abhijit Kumbhare [ 09/Oct/15 ] |
|
Michal, Can you look at this one before the next SR? I think next SR is in a month or 2. Abhijit |
| Comment by Abhijit Kumbhare [ 09/Oct/15 ] |
|
Luis says fails in both He & Li designs - is this related to OF Java Michal Polkorab? |
| Comment by Michal Rehak [ 12/Oct/15 ] |
|
Hi all, -design I got no issues: for (( i=0; i<1000; i++ )); do netstat -pn | grep -E '66[35]3' | nl Amount of threads used by controller was like 150. When running this cycle then the thread amount was always less then 160. |
| Comment by Jamo Luhrsen [ 13/Oct/15 ] |
|
(In reply to michal rehak from comment #22) -design I got Yes, you are right. The test was working quickly enough to accidentally https://git.opendaylight.org/gerrit/28356 I verified manually that this new test code will fail on Helium SR2 where I also ran extra manual tests on Lithium SR2 to ensure I don't see this |