Uploaded image for project: 'OpenFlowPlugin'
  1. OpenFlowPlugin
  2. OPNFLWPLUG-920

Cluster member gets in bad state if 2 instances disconnect just after the 3rd is recovered

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • None
    • None
    • General
    • None
    • Operating System: All
      Platform: All

    • 8805

      This happens in stable/carbon. To reproduce:

      1) Connect single switch to cluster
      sudo ovs-vsctl set-controller s1 "tcp:192.168.0.101:6633" "tcp:192.168.0.102:6633" "tcp:192.168.0.103:6633"

      2) Wait until all connections get stablished, master and slave after ~20 secs.

      3) Break connection Master to switch:
      sudo iptables -A INPUT -s 192.168.0.103 -p tcp --sport 6633 -j DROP

      4) Wait until new master is elected and then restore the connection:
      sudo iptables -D INPUT -s 192.168.0.103 -p tcp --sport 6633 -j DROP

      5) Now that connection is waiting for slave break new master and remaining connection:
      sudo iptables -A INPUT -s 192.168.0.101 -p tcp --sport 6633 -j DROP
      sudo iptables -A INPUT -s 192.168.0.102 -p tcp --sport 6633 -j DROP

      6) Following is observed in the remaining node (102), after that only solution is to reboot the node:

      2017-07-06 01:06:06,352 | INFO | ofppool-0 | LifecycleServiceImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | Starting clustering services for node openflow:1
      2017-07-06 01:06:06,352 | INFO | ofppool-0 | DeviceContextImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | Starting device context cluster services for node openflow:1
      2017-07-06 01:06:10,319 | INFO | entLoopGroup-7-2 | SystemNotificationsListenerImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | ConnectionEvent: Connection closed by device, Device:/192.168.0.24:49423, NodeId:openflow:1
      2017-07-06 01:06:10,319 | INFO | entLoopGroup-7-2 | ContextChainHolderImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | Device openflow:1 disconnected.
      2017-07-06 01:06:10,319 | INFO | entLoopGroup-7-2 | LifecycleServiceImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | Closing clustering services registration for node openflow:1
      2017-07-06 01:06:10,323 | WARN | entLoopGroup-7-2 | OF13DeviceInitializer | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | Failed to write node openflow:1 to DS because we failed to gather device info.
      2017-07-06 01:06:10,323 | ERROR | entLoopGroup-7-2 | OutboundQueueProviderImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | No queue present, failing request
      2017-07-06 01:06:10,324 | WARN | entLoopGroup-7-2 | OF13DeviceInitializer | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | Failed to retrieve static node OFPMPMETERFEATURES info: Outbound queue wasn't able to reserve XID.
      2017-07-06 01:06:10,324 | ERROR | entLoopGroup-7-2 | OutboundQueueProviderImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | No queue present, failing request
      2017-07-06 01:06:10,324 | WARN | entLoopGroup-7-2 | OF13DeviceInitializer | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | Failed to retrieve static node OFPMPGROUPFEATURES info: Outbound queue wasn't able to reserve XID.
      2017-07-06 01:06:10,325 | ERROR | entLoopGroup-7-2 | OutboundQueueProviderImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | No queue present, failing request
      2017-07-06 01:06:10,325 | WARN | entLoopGroup-7-2 | OF13DeviceInitializer | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | Failed to retrieve static node OFPMPPORTDESC info: Outbound queue wasn't able to reserve XID.
      2017-07-06 01:06:10,325 | INFO | entLoopGroup-7-2 | OF13DeviceInitializer | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | Static node openflow:1 successfully finished collecting
      2017-07-06 01:06:10,325 | INFO | entLoopGroup-7-2 | StatisticsContextImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | Stopping running statistics gathering for node openflow:1
      2017-07-06 01:06:10,326 | INFO | ofppool-0 | SalRoleServiceImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | SetRole called with input:SetRoleInput [_controllerRole=BECOMEMASTER, _node=NodeRef [_value=KeyedInstanceIdentifier

      {targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=NodeKey [_id=Uri [_value=openflow:1]]]]}

      ], augmentation=[]]
      2017-07-06 01:06:10,326 | INFO | ofppool-0 | SalRoleServiceImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | Device NodeRef [_value=KeyedInstanceIdentifier

      {targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=NodeKey [_id=Uri [_value=openflow:1]]]]}

      ] has been disconnected
      2017-07-06 01:06:10,326 | ERROR | ofppool-0 | SalRoleServiceImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | SetRoleService set Role BECOMEMASTER for Node: KeyedInstanceIdentifier

      {targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=NodeKey [_id=Uri [_value=openflow:1]]]]}

      fail . Reason java.lang.Exception: Device connection doesn't exist anymore. Primary connection status : RIP
      2017-07-06 01:06:10,326 | WARN | ofppool-0 | ContextChainHolderImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | Not able to set MASTER role on device openflow:1, reason: Was not able to set MASTER role on device
      2017-07-06 01:06:10,332 | INFO | ult-dispatcher-5 | LifecycleServiceImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | Closing clustering services for node openflow:1
      2017-07-06 01:06:10,334 | INFO | ofppool-0 | StatisticsContextImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | Starting statistics context cluster services for node openflow:1
      2017-07-06 01:06:10,335 | WARN | ofppool-0 | ContextChainHolderImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | Not able to set MASTER role on device openflow:1, reason: Initial gathering statistics unsuccessful.
      2017-07-06 01:06:10,335 | INFO | ofppool-0 | RpcContextImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | Starting rpc context cluster services for node openflow:1
      2017-07-06 01:06:10,336 | INFO | lt-dispatcher-29 | ContextChainHolderImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | Removing device Uri [_value=openflow:1] from operational DS
      2017-07-06 01:06:11,380 | ERROR | n-dispatcher-590 | OutboundQueueProviderImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | No queue present, failing request
      2017-07-06 01:06:11,381 | WARN | n-dispatcher-590 | RpcContextImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | Xid cannot be reserved for new RequestContext, node:openflow:1
      2017-07-06 01:06:11,380 | ERROR | n-dispatcher-558 | OutboundQueueProviderImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | No queue present, failing request
      2017-07-06 01:06:11,381 | WARN | n-dispatcher-558 | RpcContextImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | Xid cannot be reserved for new RequestContext, node:openflow:1
      2017-07-06 01:06:11,382 | ERROR | n-dispatcher-589 | OutboundQueueProviderImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | No queue present, failing request
      2017-07-06 01:06:11,384 | WARN | n-dispatcher-589 | RpcContextImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | Xid cannot be reserved for new RequestContext, node:openflow:1
      2017-07-06 01:06:11,384 | ERROR | n-dispatcher-589 | OutboundQueueProviderImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | No queue present, failing request
      2017-07-06 01:06:11,384 | WARN | n-dispatcher-589 | RpcContextImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | Xid cannot be reserved for new RequestContext, node:openflow:1
      2017-07-06 01:06:12,377 | ERROR | ult-dispatcher-3 | OutboundQueueProviderImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | No queue present, failing request
      2017-07-06 01:06:12,378 | WARN | ult-dispatcher-3 | RpcContextImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | Xid cannot be reserved for new RequestContext, node:openflow:1
      2017-07-06 01:06:14,901 | ERROR | pool-26-thread-1 | OutboundQueueProviderImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | No queue present, failing request
      2017-07-06 01:06:14,902 | WARN | pool-26-thread-1 | RpcContextImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | Xid cannot be reserved for new RequestContext, node:openflow:1
      2017-07-06 01:06:14,902 | ERROR | pool-26-thread-1 | OutboundQueueProviderImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | No queue present, failing request
      2017-07-06 01:06:14,902 | WARN | pool-26-thread-1 | RpcContextImpl | 220 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | Xid cannot be reserved for new RequestContext, node:openflow:1

            ecelgp Luis Gomez
            ecelgp Luis Gomez
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: