-
Bug
-
Resolution: Won't Do
-
None
-
Helium
-
None
-
Operating System: Linux
Platform: PC
-
2270
-
High
After a stress test scenario with kc-cbench tool, with a few thousands of
datapaths, a file descriptor leak is taking place due to closed_wait tcp connections. Problem is accumulative until JVM restart.
Steps to reproduce
1) Start only the sdn controller using run.sh
2) run a tool e.g. cbench -s 20000 (kc-cbench)to create thousands of tcp
connections in port 6633 until you see the following error :
2014-10-29 10:46:20.146 EET [ControllerI/O Thread] ERROR o.o.c.l.b.i.UncaughtExceptionPolicy - Thread Thread[ControllerI/O Thread,5,main] died because of an uncaught exception
java.lang.OutOfMemoryError: Direct buffer memory
NOTE:
-The number of connections limit for the error to happen should vary depending on set JVM memory limit or could even not happen if e.g. osgi is not stressed a lot.
Set JVM limit to "-Xmx1G" to avoid this.
-The way kc-cbench sdn performance tool initiates the connections could play a role in this scenario so it is recommended for reproducing it.
3) check with netstat -anp | grep 6633
Many connections stay in close wait state waiting for the controller
to close them but due to the uncaught exception they remain open leading to system resources leak (file descriptors).
tcp6 1 0 127.0.0.1:6633 127.0.0.1:33466 CLOSE_WAIT
tcp6 1 0 127.0.0.1:6633 127.0.0.1:50346 CLOSE_WAIT
tcp6 1 0 127.0.0.1:6633 127.0.0.1:38301 CLOSE_WAIT
tcp6 1 0 127.0.0.1:6633 127.0.0.1:48533 CLOSE_WAIT
tcp6 1 0 127.0.0.1:6633 127.0.0.1:37408 CLOSE_WAIT
tcp6 1 0 127.0.0.1:6633 127.0.0.1:53192 CLOSE_WAIT
tcp6 1 0 127.0.0.1:6633 127.0.0.1:52902 CLOSE_WAIT
tcp6 1 0 127.0.0.1:6633 127.0.0.1:34564 CLOSE_WAIT
tcp6 1 0 127.0.0.1:6633 127.0.0.1:49069 CLOSE_WAIT
tcp6 1 0 127.0.0.1:6633 127.0.0.1:56437 CLOSE_WAIT
tcp6 1 0 127.0.0.1:6633 127.0.0.1:58956 CLOSE_WAIT
tcp6 1 0 127.0.0.1:6633 127.0.0.1:47915 CLOSE_WAIT
******************************************************************************
...MANY SIMILAR MESSAGES FOR EACH CONNECTION
******************************************************************************
2014-10-29 10:46:19.650 EET [ControllerI/O Thread] INFO o.o.c.p.o.core.internal.Controller - Switch:127.0.0.1:60968 is connected to the Controller
2014-10-29 10:46:19.652 EET [ControllerI/O Thread] INFO o.o.c.p.o.core.internal.Controller - Switch:127.0.0.1:58972 is connected to the Controller
2014-10-29 10:46:19.653 EET [ControllerI/O Thread] INFO o.o.c.p.o.core.internal.Controller - Switch:127.0.0.1:51264 is connected to the Controller
2014-10-29 10:46:19.655 EET [ControllerI/O Thread] INFO o.o.c.p.o.core.internal.Controller - Switch:127.0.0.1:54264 is connected to the Controller
2014-10-29 10:46:19.656 EET [ControllerI/O Thread] INFO o.o.c.p.o.core.internal.Controller - Switch:127.0.0.1:36014 is connected to the Controller
2014-10-29 10:46:19.658 EET [ControllerI/O Thread] INFO o.o.c.p.o.core.internal.Controller - Switch:127.0.0.1:42616 is connected to the Controller
2014-10-29 10:46:19.659 EET [ControllerI/O Thread] INFO o.o.c.p.o.core.internal.Controller - Switch:127.0.0.1:50015 is connected to the Controller
2014-10-29 10:46:20.146 EET [ControllerI/O Thread] ERROR o.o.c.l.b.i.UncaughtExceptionPolicy - Thread Thread[ControllerI/O Thread,5,main] died because of an uncaught exception
java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_67]
at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) ~[na:1.7.0_67]
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) ~[na:1.7.0_67]
at org.opendaylight.controller.protocol_plugin.openflow.core.internal.MessageReadWriteService.<init>(MessageReadWriteService.java:47) ~[na:na]
at org.opendaylight.controller.protocol_plugin.openflow.core.internal.SwitchHandler.getMessageReadWriteService(SwitchHandler.java:826) ~[na:na]
at org.opendaylight.controller.protocol_plugin.openflow.core.internal.SwitchHandler.setupCommChannel(SwitchHandler.java:812) ~[na:na]
at org.opendaylight.controller.protocol_plugin.openflow.core.internal.SwitchHandler.start(SwitchHandler.java:149) ~[na:na]
at org.opendaylight.controller.protocol_plugin.openflow.core.internal.Controller.handleNewConnection(Controller.java:247) ~[na:na]
at org.opendaylight.controller.protocol_plugin.openflow.core.internal.ControllerIO$1.run(ControllerIO.java:90) ~[na:na]
at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_67]
Possible solution :
catch memory exception and close possible connections.
At closed_wait state client should have already sent FIN to controller but due to the exception, some connections are staying in this state since the controller did not have the time to close them and send FIN.