Uploaded image for project: 'OpenFlowPlugin'
  1. OpenFlowPlugin
  2. OPNFLWPLUG-1018

OVS not able to connect to ODL - contextChain stuck in CLOSED state

XMLWordPrintable

      We have seen sporadic failures where OVS on a compute node fails to connect to ODL when set-controller is executed on the compute node. These failures usually occur during deployment.

      When this failure occurs, OVS on the compute node continues attempting to connect to ODL. Openflowplugin immediately closes the new incoming connection, as it thinks the connection is in termination state. This cycle repeats endlessly once we are in this state.

       

      Since this is a random, intermittent, sorta difficult to reproduce bug (and would take forever having to try and hit this by having to redeploy every time), I wrote a small test script to reproduce this issue. Script is attached as test.sh.

      I enabled TRACE logs for openflowplugin.impl.lifecycle, and was able to reproduce the issue. I've been looking at the logs and code, but would be really great if someone on the openflowplugin team could jump in and take a look.

      I've included a snippet of some relevant karaf and ovs-vswitchd logs below. Full karaf log and ovs-vswitchd logs are attached. 

      The timestamp of interest in the karaf logs starts around timestamp  2018-06-08T22:32:11,465, which is line number 4661.

       

      Setup

      Devstack setup, with 1 control node (VM) and 2 compute nodes (VMs) on 1 bm server.

      Devstack is stable/queens.

      ODL is stable/oxygen.

       

      Karaf logs

      2018-06-08T22:32:11,465 | WARN  | epollEventLoopGroup-9-6 | ContextChainHolderImpl           | 401 - org.opendaylight.openflowplugin.impl - 0.7.0.S
      NAPSHOT | Device openflow:220918713701936 is already in termination state, closing all incoming connections.
      2018-06-08T22:32:11,870 | INFO  | epollEventLoopGroup-9-7 | AbstractConnectionAdapter        | 410 - org.opendaylight.openflowplugin.openflowjava.o
      penflow-protocol-impl - 0.7.0.SNAPSHOT | The channel outbound queue size:1024
      2018-06-08T22:32:11,871 | INFO  | epollEventLoopGroup-9-7 | ConnectionAdapterImpl            | 410 - org.opendaylight.openflowplugin.openflowjava.o
      penflow-protocol-impl - 0.7.0.SNAPSHOT | Hello received
      2018-06-08T22:32:11,873 | INFO  | epollEventLoopGroup-9-7 | ContextChainHolderImpl           | 401 - org.opendaylight.openflowplugin.impl - 0.7.0.S
      NAPSHOT | Device openflow:220918713701936 connected.
      2018-06-08T22:32:11,873 | WARN  | epollEventLoopGroup-9-7 | ContextChainHolderImpl           | 401 - org.opendaylight.openflowplugin.impl - 0.7.0.S
      NAPSHOT | Device openflow:220918713701936 is already in termination state, closing all incoming connections.

      2018-06-08T22:32:13,868 | INFO  | epollEventLoopGroup-9-8 | AbstractConnectionAdapter        | 410 - org.opendaylight.openflowplugin.openflowjava.o
      penflow-protocol-impl - 0.7.0.SNAPSHOT | The channel outbound queue size:1024
      2018-06-08T22:32:13,869 | INFO  | epollEventLoopGroup-9-8 | ConnectionAdapterImpl            | 410 - org.opendaylight.openflowplugin.openflowjava.o
      penflow-protocol-impl - 0.7.0.SNAPSHOT | Hello received
      2018-06-08T22:32:13,872 | INFO  | epollEventLoopGroup-9-8 | ContextChainHolderImpl           | 401 - org.opendaylight.openflowplugin.impl - 0.7.0.S
      NAPSHOT | Device openflow:220918713701936 connected.
      2018-06-08T22:32:13,873 | WARN  | epollEventLoopGroup-9-8 | ContextChainHolderImpl           | 401 - org.opendaylight.openflowplugin.impl - 0.7.0.S
      NAPSHOT | Device openflow:220918713701936 is already in termination state, closing all incoming connections.

       

      ovs-vswitchd logs

      2018-06-08T23:04:42.855Z|13767|rconn|INFO|br-int<->tcp:10.8.125.230:6653: connected
      2018-06-08T23:04:42.858Z|13768|rconn|INFO|br-int<->tcp:10.8.125.230:6653: connection closed by peer
      2018-06-08T23:04:50.854Z|13769|rconn|INFO|br-int<->tcp:10.8.125.230:6653: connected
      2018-06-08T23:04:50.857Z|13770|rconn|INFO|br-int<->tcp:10.8.125.230:6653: connection closed by peer
      2018-06-08T23:04:58.854Z|13771|rconn|INFO|br-int<->tcp:10.8.125.230:6653: connected
      2018-06-08T23:04:58.857Z|13772|rconn|INFO|br-int<->tcp:10.8.125.230:6653: connection closed by peer
      2018-06-08T23:05:06.855Z|13773|rconn|INFO|br-int<->tcp:10.8.125.230:6653: connected
      2018-06-08T23:05:06.858Z|13774|rconn|INFO|br-int<->tcp:10.8.125.230:6653: connection closed by peer
      2018-06-08T23:05:14.853Z|13775|rconn|INFO|br-int<->tcp:10.8.125.230:6653: connected
      2018-06-08T23:05:14.856Z|13776|rconn|INFO|br-int<->tcp:10.8.125.230:6653: connection closed by peer
      2018-06-08T23:05:22.854Z|13777|rconn|INFO|br-int<->tcp:10.8.125.230:6653: connected
      2018-06-08T23:05:22.856Z|13778|rconn|INFO|br-int<->tcp:10.8.125.230:6653: connection closed by peer
      2018-06-08T23:05:30.855Z|13779|rconn|INFO|br-int<->tcp:10.8.125.230:6653: connected
      2018-06-08T23:05:30.858Z|13780|rconn|INFO|br-int<->tcp:10.8.125.230:6653: connection closed by peer
      2018-06-08T23:05:38.853Z|13781|rconn|INFO|br-int<->tcp:10.8.125.230:6653: connected
      2018-06-08T23:05:38.855Z|13782|rconn|INFO|br-int<->tcp:10.8.125.230:6653: connection closed by peer
      2018-06-08T23:05:46.855Z|13783|rconn|INFO|br-int<->tcp:10.8.125.230:6653: connected

       

       

       

       

        1. karaf.log.debug
          1.36 MB
        2. ovs-vswitchd.log
          1.54 MB
        3. test.sh
          0.9 kB
        4. test2.sh
          0.9 kB

            Avishnoi Anil Vishnoi
            vpickard Victor Pickard
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: