[OPNFLWPLUG-845] Switch hangs in controller if a reconnect happens just after switch is set in IDLE state Created: 27/Jan/17  Updated: 27/Sep/21  Resolved: 02/Mar/17

Status: Resolved
Project: OpenFlowPlugin
Component/s: General
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Luis Gomez Assignee: Luis Gomez
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 7689

 Description   

To reproduce:

1) Start mininet to controller:

sudo mn --controller 'remote,ip=192.168.0.2,port=6633' --topo tree,1

2) On the switch block the OF connection using iptables. This will simulate some OF agents behavior that do not close the OF connection when they are stopped:

sudo iptables -A OUTPUT -d 192.168.0.2 -j DROP

3) Wait until controller sends ECHO request and switch is declared IDLE. Controller will close the OF session in 2 secs after this message:

2017-01-27 04:50:58,890 | INFO | ofppool-0 | SystemNotificationsListenerImpl | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | Switch Idle state occurred, node=/192.168.0.24:34288|auxId=0

4) Before the 2 seconds (just right after the above message) restore the OF connection:

sudo iptables -D OUTPUT -d 192.168.0.2 -j DROP

5) This what is seen in karaf log, the switch hangs in the controller and only workaround is to restart the controller:

2017-01-27 04:50:59,887 | INFO | ntLoopGroup-11-5 | SystemNotificationsListenerImpl | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | ConnectionEvent: Connection closed by device, Device:/192.168.0.24:34288, NodeId:openflow:1
2017-01-27 04:50:59,894 | WARN | ntLoopGroup-11-5 | StatisticsGatheringUtils | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | Can't write to transaction, transaction chain probably closed.
2017-01-27 04:50:59,894 | WARN | ntLoopGroup-11-5 | StatisticsManagerImpl | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | Statistics gathering for single node openflow:1 was not successful: Task was cancelled.
2017-01-27 04:51:00,704 | INFO | ntLoopGroup-11-6 | ConnectionAdapterImpl | 277 - org.opendaylight.openflowjava.openflow-protocol-impl - 0.8.1.Boron-SR1 | Hello received / branch
2017-01-27 04:51:00,706 | INFO | ntLoopGroup-11-6 | DeviceManagerImpl | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | ConnectionEvent: Device connected to controller, Device:/192.168.0.24:34291, NodeId:Uri [_value=openflow:1]
2017-01-27 04:51:00,707 | INFO | ntLoopGroup-11-6 | SalRoleServiceImpl | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | SetRole called with input:SetRoleInput [_controllerRole=BECOMESLAVE, _node=NodeRef [_value=KeyedInstanceIdentifier

{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=NodeKey [_id=Uri [_value=openflow:1]]]]}

], augmentation=[]]
2017-01-27 04:51:00,707 | INFO | ntLoopGroup-11-6 | SalRoleServiceImpl | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | Requesting state change to BECOMESLAVE
2017-01-27 04:51:00,707 | INFO | ntLoopGroup-11-6 | SalRoleServiceImpl | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | RoleChangeTask called on device:openflow:1 OFPRole:BECOMESLAVE
2017-01-27 04:51:00,707 | INFO | ntLoopGroup-11-6 | RoleService | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | getGenerationIdFromDevice called for device: openflow:1
2017-01-27 04:51:00,707 | INFO | ntLoopGroup-11-6 | LifecycleServiceImpl | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | Registered clustering MASTER services for node openflow:1
2017-01-27 04:51:00,709 | INFO | ntLoopGroup-11-6 | RoleService | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | submitRoleChange called for device:Uri [_value=openflow:1], role:BECOMESLAVE
2017-01-27 04:51:00,710 | INFO | ntLoopGroup-11-6 | RoleService | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | submitRoleChange onSuccess for device:Uri [_value=openflow:1], role:BECOMESLAVE
2017-01-27 04:51:00,712 | INFO | ult-dispatcher-2 | LifecycleServiceImpl | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | Starting clustering MASTER services for node openflow:1
2017-01-27 04:51:00,712 | INFO | ult-dispatcher-2 | DeviceContextImpl | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | Starting device context cluster services for node openflow:1
2017-01-27 04:51:00,713 | INFO | ntLoopGroup-11-6 | DeviceInitializationUtils | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | IP address of switch is: /192.168.0.24:34291
2017-01-27 04:51:00,716 | INFO | ntLoopGroup-11-6 | DeviceInitializationUtils | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | Static node Uri [_value=openflow:1] info: OFPMPMETERFEATURES collected
2017-01-27 04:51:00,717 | INFO | ntLoopGroup-11-6 | DeviceInitializationUtils | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | Static node Uri [_value=openflow:1] info: OFPMPGROUPFEATURES collected
2017-01-27 04:51:00,717 | INFO | ntLoopGroup-11-6 | DeviceInitializationUtils | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | Static node Uri [_value=openflow:1] info: OFPMPPORTDESC collected
2017-01-27 04:51:00,718 | INFO | ult-dispatcher-2 | SalRoleServiceImpl | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | SetRole called with input:SetRoleInput [_controllerRole=BECOMEMASTER, _node=NodeRef [_value=KeyedInstanceIdentifier

{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=NodeKey [_id=Uri [_value=openflow:1]]]]}

], augmentation=[]]
2017-01-27 04:51:00,718 | INFO | ult-dispatcher-2 | SalRoleServiceImpl | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | Requesting state change to BECOMEMASTER
2017-01-27 04:51:00,718 | INFO | ult-dispatcher-2 | SalRoleServiceImpl | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | RoleChangeTask called on device:openflow:1 OFPRole:BECOMEMASTER
2017-01-27 04:51:00,718 | INFO | ult-dispatcher-2 | RoleService | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | getGenerationIdFromDevice called for device: openflow:1
2017-01-27 04:51:00,718 | INFO | ult-dispatcher-2 | StatisticsContextImpl | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | Starting statistics context cluster services for node openflow:1
2017-01-27 04:51:00,718 | INFO | ult-dispatcher-2 | StatisticsManagerImpl | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | Scheduling statistics poll for device: Uri [_value=openflow:1]
2017-01-27 04:51:00,719 | INFO | ntLoopGroup-11-6 | RoleService | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | submitRoleChange called for device:Uri [_value=openflow:1], role:BECOMEMASTER
2017-01-27 04:51:00,722 | INFO | ntLoopGroup-11-6 | RoleService | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | submitRoleChange onSuccess for device:Uri [_value=openflow:1], role:BECOMEMASTER
2017-01-27 04:51:00,891 | WARN | ofppool-0 | SystemNotificationsListenerImpl | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | Exception while waiting for echoReply from [/192.168.0.24:34288] in TIMEOUTING state: Timeout waiting for task.
2017-01-27 04:51:00,891 | INFO | ofppool-0 | SystemNotificationsListenerImpl | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | ConnectionEvent:Closing connection as device is idle. Echo sent at Fri Jan 27 04:50:58 UTC 2017. Device:/192.168.0.24:34288, NodeId:openflow:1
2017-01-27 04:51:00,901 | INFO | ofppool-0 | SalRoleServiceImpl | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | SetRole called with input:SetRoleInput [_controllerRole=BECOMESLAVE, _node=NodeRef [_value=KeyedInstanceIdentifier

{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=NodeKey [_id=Uri [_value=openflow:1]]]]}

], augmentation=[]]
2017-01-27 04:51:00,901 | INFO | ofppool-0 | SalRoleServiceImpl | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | Requesting state change to BECOMESLAVE
2017-01-27 04:51:00,901 | INFO | ofppool-0 | SalRoleServiceImpl | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | RoleChangeTask called on device:openflow:1 OFPRole:BECOMESLAVE
2017-01-27 04:51:00,901 | INFO | ofppool-0 | RoleService | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | getGenerationIdFromDevice called for device: openflow:1
2017-01-27 04:51:00,902 | INFO | ntLoopGroup-11-6 | RoleService | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | submitRoleChange called for device:Uri [_value=openflow:1], role:BECOMESLAVE
2017-01-27 04:51:00,902 | INFO | ntLoopGroup-11-6 | RoleService | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | submitRoleChange onSuccess for device:Uri [_value=openflow:1], role:BECOMESLAVE
2017-01-27 04:51:08,823 | WARN | lt-dispatcher-35 | RpcRegistry | 189 - org.opendaylight.controller.sal-remoterpc-connector - 1.4.1.Boron-SR1 | Timed out finding routers for RouteIdentifierImpl{context=null, type=(urn:opendaylight:packet:service?revision=2013-07-09)transmit-packet, route=/(urn:opendaylight:inventory?revision=2013-08-19)nodes/node/node[

{(urn:opendaylight:inventory?revision=2013-08-19)id=openflow:1}

]}
2017-01-27 04:51:08,824 | WARN | lt-dispatcher-35 | RpcRegistry | 189 - org.opendaylight.controller.sal-remoterpc-connector - 1.4.1.Boron-SR1 | Timed out finding routers for RouteIdentifierImpl{context=null, type=(urn:opendaylight:packet:service?revision=2013-07-09)transmit-packet, route=/(urn:opendaylight:inventory?revision=2013-08-19)nodes/node/node[

{(urn:opendaylight:inventory?revision=2013-08-19)id=openflow:1}

]}
2017-01-27 04:51:13,820 | WARN | ult-dispatcher-4 | RpcRegistry | 189 - org.opendaylight.controller.sal-remoterpc-connector - 1.4.1.Boron-SR1 | Timed out finding routers for RouteIdentifierImpl{context=null, type=(urn:opendaylight:packet:service?revision=2013-07-09)transmit-packet, route=/(urn:opendaylight:inventory?revision=2013-08-19)nodes/node/node[

{(urn:opendaylight:inventory?revision=2013-08-19)id=openflow:1}

]}
2017-01-27 04:51:13,821 | WARN | ult-dispatcher-4 | RpcRegistry | 189 - org.opendaylight.controller.sal-remoterpc-connector - 1.4.1.Boron-SR1 | Timed out finding routers for RouteIdentifierImpl{context=null, type=(urn:opendaylight:packet:service?revision=2013-07-09)transmit-packet, route=/(urn:opendaylight:inventory?revision=2013-08-19)nodes/node/node[

{(urn:opendaylight:inventory?revision=2013-08-19)id=openflow:1}

]}
2017-01-27 04:51:16,098 | WARN | ntLoopGroup-11-6 | DeviceContextImpl | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | Error processing port status message for port 1 on device Uri [_value=openflow:1] : org.opendaylight.controller.md.sal.common.api.data.TransactionChainClosedException: Cannot write into transaction.
2017-01-27 04:51:16,130 | WARN | ntLoopGroup-11-6 | DeviceContextImpl | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | Error processing port status message for port 2 on device Uri [_value=openflow:1] : org.opendaylight.controller.md.sal.common.api.data.TransactionChainClosedException: Cannot write into transaction.
2017-01-27 04:51:16,187 | INFO | ntLoopGroup-11-6 | SystemNotificationsListenerImpl | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | ConnectionEvent: Connection closed by device, Device:/192.168.0.24:34291, NodeId:openflow:1
2017-01-27 04:51:16,187 | INFO | ntLoopGroup-11-6 | DeviceManagerImpl | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | DeviceContext for Node openflow:1 was not found. Connection is terminated without OFP context suite.
2017-01-27 04:51:16,201 | WARN | Thread-104 | DeviceContextImpl | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | Error processing port status message for 0 on port 1 on device Uri [_value=openflow:1] : org.opendaylight.controller.md.sal.common.api.data.TransactionChainClosedException: Cannot write into transaction.
2017-01-27 04:51:16,231 | WARN | Thread-105 | DeviceContextImpl | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | Error processing port status message for 0 on port 2 on device Uri [_value=openflow:1] : org.opendaylight.controller.md.sal.common.api.data.TransactionChainClosedException: Cannot write into transaction.
2017-01-27 04:51:16,303 | WARN | Thread-104 | DeviceContextImpl | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | Error processing port status message for 1 on port 1 on device Uri [_value=openflow:1] : org.opendaylight.controller.md.sal.common.api.data.TransactionChainClosedException: Cannot write into transaction.
2017-01-27 04:51:16,332 | WARN | Thread-105 | DeviceContextImpl | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | Error processing port status message for 1 on port 2 on device Uri [_value=openflow:1] : org.opendaylight.controller.md.sal.common.api.data.TransactionChainClosedException: Cannot write into transaction.
2017-01-27 04:51:16,404 | WARN | Thread-104 | DeviceContextImpl | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | Error processing port status message for 2 on port 1 on device Uri [_value=openflow:1] : org.opendaylight.controller.md.sal.common.api.data.TransactionChainClosedException: Cannot write into transaction.
2017-01-27 04:51:16,404 | WARN | Thread-104 | DeviceContextImpl | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | Failed to update port status for port 1 on device Uri [_value=openflow:1] even after 3 retries
2017-01-27 04:51:16,433 | WARN | Thread-105 | DeviceContextImpl | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | Error processing port status message for 2 on port 2 on device Uri [_value=openflow:1] : org.opendaylight.controller.md.sal.common.api.data.TransactionChainClosedException: Cannot write into transaction.
2017-01-27 04:51:16,433 | WARN | Thread-105 | DeviceContextImpl | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | Failed to update port status for port 2 on device Uri [_value=openflow:1] even after 3 retries
2017-01-27 04:51:18,821 | WARN | ult-dispatcher-4 | RpcRegistry | 189 - org.opendaylight.controller.sal-remoterpc-connector - 1.4.1.Boron-SR1 | Timed out finding routers for RouteIdentifierImpl{context=null, type=(urn:opendaylight:packet:service?revision=2013-07-09)transmit-packet, route=/(urn:opendaylight:inventory?revision=2013-08-19)nodes/node/node[

{(urn:opendaylight:inventory?revision=2013-08-19)id=openflow:1}

]}
2017-01-27 04:51:18,822 | WARN | lt-dispatcher-26 | RpcRegistry | 189 - org.opendaylight.controller.sal-remoterpc-connector - 1.4.1.Boron-SR1 | Timed out finding routers for RouteIdentifierImpl{context=null, type=(urn:opendaylight:packet:service?revision=2013-07-09)transmit-packet, route=/(urn:opendaylight:inventory?revision=2013-08-19)nodes/node/node[

{(urn:opendaylight:inventory?revision=2013-08-19)id=openflow:1}

]}
2017-01-27 04:51:23,821 | WARN | lt-dispatcher-35 | RpcRegistry | 189 - org.opendaylight.controller.sal-remoterpc-connector - 1.4.1.Boron-SR1 | Timed out finding routers for RouteIdentifierImpl{context=null, type=(urn:opendaylight:packet:service?revision=2013-07-09)transmit-packet, route=/(urn:opendaylight:inventory?revision=2013-08-19)nodes/node/node[

{(urn:opendaylight:inventory?revision=2013-08-19)id=openflow:1}

]}
2017-01-27 04:51:23,821 | WARN | lt-dispatcher-35 | RpcRegistry | 189 - org.opendaylight.controller.sal-remoterpc-connector - 1.4.1.Boron-SR1 | Timed out finding routers for RouteIdentifierImpl{context=null, type=(urn:opendaylight:packet:service?revision=2013-07-09)transmit-packet, route=/(urn:opendaylight:inventory?revision=2013-08-19)nodes/node/node[

{(urn:opendaylight:inventory?revision=2013-08-19)id=openflow:1}

]}
2017-01-27 04:51:28,821 | WARN | lt-dispatcher-35 | RpcRegistry | 189 - org.opendaylight.controller.sal-remoterpc-connector - 1.4.1.Boron-SR1 | Timed out finding routers for RouteIdentifierImpl{context=null, type=(urn:opendaylight:packet:service?revision=2013-07-09)transmit-packet, route=/(urn:opendaylight:inventory?revision=2013-08-19)nodes/node/node[

{(urn:opendaylight:inventory?revision=2013-08-19)id=openflow:1}

]}
2017-01-27 04:51:28,821 | WARN | lt-dispatcher-35 | RpcRegistry | 189 - org.opendaylight.controller.sal-remoterpc-connector - 1.4.1.Boron-SR1 | Timed out finding routers for RouteIdentifierI



 Comments   
Comment by Tomas Slusny [ 21/Feb/17 ]

I successfully reproduced this bug on my local machine, and pushed patch to Gerrit that hopefully resolves this bug (was not able to reproduce it any more when following Luis's steps).

Here is link to gerrit: https://git.opendaylight.org/gerrit/#/c/52116/

Luis, can you retest it please and verify that it is resolved?

Comment by Abhijit Kumbhare [ 23/Feb/17 ]

Luis,

Any update on this bug as Tomas mentions in the last comment it has been fixed?

Abhijit

Comment by Luis Gomez [ 24/Feb/17 ]

Yes, this seems to work in the patch.

Comment by Tomas Slusny [ 27/Feb/17 ]

Great then, patch was merged, so I think we can close this issue, right?

Comment by Anil Vishnoi [ 02/Mar/17 ]

Do we want to cherry-pick this to stable/boron?

Comment by Luis Gomez [ 02/Mar/17 ]

Yes.

Generated at Wed Feb 07 20:33:32 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.