Uploaded image for project: 'controller'
  1. controller
  2. CONTROLLER-969

Multiple CLOSE_WAIT connections after unhandled memory exception during stress test

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • None
    • Helium
    • adsal
    • None
    • Operating System: Linux
      Platform: PC

    • 2270
    • High

      After a stress test scenario with kc-cbench tool, with a few thousands of
      datapaths, a file descriptor leak is taking place due to closed_wait tcp connections. Problem is accumulative until JVM restart.

      Steps to reproduce

      1) Start only the sdn controller using run.sh
      2) run a tool e.g. cbench -s 20000 (kc-cbench)to create thousands of tcp
      connections in port 6633 until you see the following error :

      2014-10-29 10:46:20.146 EET [ControllerI/O Thread] ERROR o.o.c.l.b.i.UncaughtExceptionPolicy - Thread Thread[ControllerI/O Thread,5,main] died because of an uncaught exception
      java.lang.OutOfMemoryError: Direct buffer memory

      NOTE:
      -The number of connections limit for the error to happen should vary depending on set JVM memory limit or could even not happen if e.g. osgi is not stressed a lot.
      Set JVM limit to "-Xmx1G" to avoid this.
      -The way kc-cbench sdn performance tool initiates the connections could play a role in this scenario so it is recommended for reproducing it.

      3) check with netstat -anp | grep 6633
      Many connections stay in close wait state waiting for the controller
      to close them but due to the uncaught exception they remain open leading to system resources leak (file descriptors).

      tcp6 1 0 127.0.0.1:6633 127.0.0.1:33466 CLOSE_WAIT
      tcp6 1 0 127.0.0.1:6633 127.0.0.1:50346 CLOSE_WAIT
      tcp6 1 0 127.0.0.1:6633 127.0.0.1:38301 CLOSE_WAIT
      tcp6 1 0 127.0.0.1:6633 127.0.0.1:48533 CLOSE_WAIT
      tcp6 1 0 127.0.0.1:6633 127.0.0.1:37408 CLOSE_WAIT
      tcp6 1 0 127.0.0.1:6633 127.0.0.1:53192 CLOSE_WAIT
      tcp6 1 0 127.0.0.1:6633 127.0.0.1:52902 CLOSE_WAIT
      tcp6 1 0 127.0.0.1:6633 127.0.0.1:34564 CLOSE_WAIT
      tcp6 1 0 127.0.0.1:6633 127.0.0.1:49069 CLOSE_WAIT
      tcp6 1 0 127.0.0.1:6633 127.0.0.1:56437 CLOSE_WAIT
      tcp6 1 0 127.0.0.1:6633 127.0.0.1:58956 CLOSE_WAIT
      tcp6 1 0 127.0.0.1:6633 127.0.0.1:47915 CLOSE_WAIT

      ******************************************************************************
      ...MANY SIMILAR MESSAGES FOR EACH CONNECTION
      ******************************************************************************
      2014-10-29 10:46:19.650 EET [ControllerI/O Thread] INFO o.o.c.p.o.core.internal.Controller - Switch:127.0.0.1:60968 is connected to the Controller
      2014-10-29 10:46:19.652 EET [ControllerI/O Thread] INFO o.o.c.p.o.core.internal.Controller - Switch:127.0.0.1:58972 is connected to the Controller
      2014-10-29 10:46:19.653 EET [ControllerI/O Thread] INFO o.o.c.p.o.core.internal.Controller - Switch:127.0.0.1:51264 is connected to the Controller
      2014-10-29 10:46:19.655 EET [ControllerI/O Thread] INFO o.o.c.p.o.core.internal.Controller - Switch:127.0.0.1:54264 is connected to the Controller
      2014-10-29 10:46:19.656 EET [ControllerI/O Thread] INFO o.o.c.p.o.core.internal.Controller - Switch:127.0.0.1:36014 is connected to the Controller
      2014-10-29 10:46:19.658 EET [ControllerI/O Thread] INFO o.o.c.p.o.core.internal.Controller - Switch:127.0.0.1:42616 is connected to the Controller
      2014-10-29 10:46:19.659 EET [ControllerI/O Thread] INFO o.o.c.p.o.core.internal.Controller - Switch:127.0.0.1:50015 is connected to the Controller
      2014-10-29 10:46:20.146 EET [ControllerI/O Thread] ERROR o.o.c.l.b.i.UncaughtExceptionPolicy - Thread Thread[ControllerI/O Thread,5,main] died because of an uncaught exception
      java.lang.OutOfMemoryError: Direct buffer memory
      at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_67]
      at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) ~[na:1.7.0_67]
      at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) ~[na:1.7.0_67]
      at org.opendaylight.controller.protocol_plugin.openflow.core.internal.MessageReadWriteService.<init>(MessageReadWriteService.java:47) ~[na:na]
      at org.opendaylight.controller.protocol_plugin.openflow.core.internal.SwitchHandler.getMessageReadWriteService(SwitchHandler.java:826) ~[na:na]
      at org.opendaylight.controller.protocol_plugin.openflow.core.internal.SwitchHandler.setupCommChannel(SwitchHandler.java:812) ~[na:na]
      at org.opendaylight.controller.protocol_plugin.openflow.core.internal.SwitchHandler.start(SwitchHandler.java:149) ~[na:na]
      at org.opendaylight.controller.protocol_plugin.openflow.core.internal.Controller.handleNewConnection(Controller.java:247) ~[na:na]
      at org.opendaylight.controller.protocol_plugin.openflow.core.internal.ControllerIO$1.run(ControllerIO.java:90) ~[na:na]
      at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_67]

      Possible solution :
      catch memory exception and close possible connections.
      At closed_wait state client should have already sent FIN to controller but due to the exception, some connections are staying in this state since the controller did not have the time to close them and send FIN.

            Unassigned Unassigned
            ekontothanasis@yahoo.co.uk EPAMEINONDAS KONTOTHANASIS
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: