Uploaded image for project: 'OpenFlowPlugin'
  1. OpenFlowPlugin
  2. OPNFLWPLUG-970

Unable to establish connections with switches after frequent connecting and disconnecting

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Highest Highest
    • Oxygen
    • Carbon-SR3, Carbon-SR2
    • openflowplugin-impl
    • None
    • Operating System: Ubuntu 14.04
      Platform: PC

      My environment has three physical switches connecting to the opendaylight controller of carbon-sr2.

      After keeping making two of the switches connected and disconnected frequently for a while, the opendaylight controller couldn't set up the connections with switches any longer. Please see the attachment for karaf log.

       

      One fact is that everything started to go wrong when the following log message showed up:

      2018-01-08 14:21:35,601 | WARN | entLoopGroup-7-8 | ContextChainHolderImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Device openflow:35776560030 is already in termination state, closing all incoming connections.

       

      Then all connections will be closed after that like this:

      2018-01-08 14:21:35,601 | INFO | entLoopGroup-7-8 | ContextChainHolderImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Device openflow:35776560030 connected.
      2018-01-08 14:21:35,601 | WARN | entLoopGroup-7-8 | ContextChainHolderImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Device openflow:35776560030 is already in termination state, closing all incoming connections.

       

      Here is part of the log (filter by openflow:35776560030) when it went wrong:

      2018-01-08 14:19:24,615 | WARN | lt-dispatcher-19 | ContextChainImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Not able to start clustering services for node openflow:35776560030
      2018-01-08 14:19:24,615 | WARN | ofppool-2 | ContextChainHolderImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Not able to set MASTER role on device openflow:35776560030, reason: Failed to initialize device openflow:35776560030 in 9s: java.util.concurrent.TimeoutException: Timeout waiting for task.
      2018-01-08 14:19:24,615 | WARN | ofppool-2 | ContextChainHolderImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | This mastering is mandatory, destroying context chain and closing connection for device openflow:35776560030.
      2018-01-08 14:19:24,616 | INFO | ofppool-2 | ContextChainImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Closed clustering services registration for node openflow:35776560030
      2018-01-08 14:19:24,621 | INFO | ult-dispatcher-3 | ContextChainImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Closing clustering services for node openflow:35776560030
      2018-01-08 14:19:24,621 | INFO | ult-dispatcher-3 | StatisticsContextImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Stopping statistics context cluster services for node openflow:35776560030
      2018-01-08 14:19:24,621 | INFO | ult-dispatcher-3 | StatisticsContextImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Stopping running statistics gathering for node openflow:35776560030
      2018-01-08 14:19:24,621 | INFO | ult-dispatcher-3 | RpcContextImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Stopping rpc context cluster services for node openflow:35776560030
      2018-01-08 14:19:24,621 | INFO | ult-dispatcher-3 | DeviceContextImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Stopping device context cluster services for node openflow:35776560030
      2018-01-08 14:19:24,622 | INFO | ult-dispatcher-3 | ContextChainImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Closed clustering services for node openflow:35776560030
      2018-01-08 14:19:24,625 | DEBUG | lt-dispatcher-20 | ContextChainHolderImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Entity openflow:35776560030 has no owner
      2018-01-08 14:19:24,625 | INFO | lt-dispatcher-20 | ContextChainHolderImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Removing device Uri [_value=openflow:35776560030] from operational DS
      2018-01-08 14:19:24,626 | DEBUG | CommitFutures-1 | DeviceManagerImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Delete Node KeyedInstanceIdentifier{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=NodeKey [_id=Uri [_value=openflow:35776560030]]]]} was successful
      2018-01-08 14:19:32,642 | INFO | ofppool-2 | SystemNotificationsListenerImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | ConnectionEvent:Closing connection as device is idle. Echo sent at Mon Jan 08 14:19:30 CST 2018. Device:/192.168.1.103:29990, NodeId:openflow:35776560030
      2018-01-08 14:19:32,642 | DEBUG | ofppool-2 | ConnectionContextImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Actively closing connection: device=openflow:35776560030 | auxiliaryId=0 | connectionState=RIP
      2018-01-08 14:19:32,642 | DEBUG | ofppool-2 | ConnectionContextImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Trying unregister outbound queue handler registration for node openflow:35776560030
      2018-01-08 14:19:32,642 | DEBUG | ofppool-2 | ConnectionContextImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Trying closing handshake context for node openflow:35776560030
      2018-01-08 14:19:32,642 | DEBUG | ofppool-2 | ConnectionContextImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Propagating connection closed event: /192.168.1.103:29990, datapathId:35776560030.
      2018-01-08 14:19:32,643 | INFO | ofppool-2 | ContextChainHolderImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Device openflow:35776560030 disconnected.
      2018-01-08 14:19:32,644 | DEBUG | ofppool-2 | ContextChainImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | ContextChain for node openflow:35776560030 is already in TERMINATION state.
      2018-01-08 14:21:35,601 | DEBUG | entLoopGroup-7-8 | HandshakeListenerImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | succeeded by getting sweep barrier after post-handshake for device openflow:35776560030
      2018-01-08 14:21:35,601 | INFO | entLoopGroup-7-8 | ContextChainHolderImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Device openflow:35776560030 connected.
      2018-01-08 14:21:35,601 | WARN | entLoopGroup-7-8 | ContextChainHolderImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Device openflow:35776560030 is already in termination state, closing all incoming connections.
      2018-01-08 14:21:35,601 | DEBUG | entLoopGroup-7-8 | ConnectionContextImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Actively closing connection: device=openflow:35776560030 | auxiliaryId=0 | connectionState=RIP
      2018-01-08 14:21:35,601 | DEBUG | entLoopGroup-7-8 | ConnectionContextImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Trying unregister outbound queue handler registration for node openflow:35776560030
      2018-01-08 14:21:35,601 | DEBUG | entLoopGroup-7-8 | ConnectionContextImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Trying closing handshake context for node openflow:35776560030
      2018-01-08 14:21:43,691 | DEBUG | ntLoopGroup-7-10 | HandshakeListenerImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | succeeded by getting sweep barrier after post-handshake for device openflow:35776560030
      2018-01-08 14:21:43,692 | INFO | ntLoopGroup-7-10 | ContextChainHolderImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Device openflow:35776560030 connected.
      2018-01-08 14:21:43,692 | WARN | ntLoopGroup-7-10 | ContextChainHolderImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Device openflow:35776560030 is already trying to connect, wait until succeeded or disconnected.
      2018-01-08 14:21:43,692 | DEBUG | ntLoopGroup-7-10 | ConnectionContextImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Actively closing connection: device=openflow:35776560030 | auxiliaryId=0 | connectionState=RIP
      2018-01-08 14:21:43,692 | DEBUG | ntLoopGroup-7-10 | ConnectionContextImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Trying unregister outbound queue handler registration for node openflow:35776560030
      2018-01-08 14:21:43,692 | DEBUG | ntLoopGroup-7-10 | ConnectionContextImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Trying closing handshake context for node openflow:35776560030
      2018-01-08 14:21:51,812 | DEBUG | ntLoopGroup-7-12 | HandshakeListenerImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | succeeded by getting sweep barrier after post-handshake for device openflow:35776560030
      2018-01-08 14:21:51,812 | INFO | ntLoopGroup-7-12 | ContextChainHolderImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Device openflow:35776560030 connected.
      2018-01-08 14:21:51,813 | WARN | ntLoopGroup-7-12 | ContextChainHolderImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Device openflow:35776560030 is already trying to connect, wait until succeeded or disconnected.

       

      I tried to analyze it, it seems sometimes the failure in setting master role on switches would cause this issue.

      The pattern is like: Failure in setting master role -> destroying context chain -> Closing connection as device is idle. -> disconnected -> conencted -> already in termination state -> connected -> already trying to connect -> connected -> already trying to connect -> ...

        1. karaf.log
          789 kB
        2. stack trace.pdf
          65 kB

            jamescch Chien-Hsin Chen
            jamescch Chien-Hsin Chen
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: