[OPNFLWPLUG-939] Regression in Switch scalability test (carbon) Created: 04/Sep/17  Updated: 27/Sep/21  Resolved: 27/Sep/17

Status: Resolved
Project: OpenFlowPlugin
Component/s: General
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Luis Gomez Assignee: Luis Gomez
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 9089

 Description   

It seems we can hardly do more than 100 switches in carbon/nitrogen:

https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-periodic-sw-scalability-daily-only-carbon/plot/Switch%20Scalability/

https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-periodic-sw-scalability-daily-only-nitrogen/plot/Switch%20Scalability/

However boron seems pretty good:

https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-periodic-sw-scalability-daily-only-boron/plot/Switch%20Scalability/



 Comments   
Comment by Luis Gomez [ 04/Sep/17 ]

From the karaf log, it seems switches fail to connect:

2017-09-03 11:14:42,190 | ERROR | pool-43-thread-1 | OutboundQueueProviderImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | No queue present, failing request
2017-09-03 11:14:42,195 | ERROR | pool-43-thread-1 | OutboundQueueProviderImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | No queue present, failing request
2017-09-03 11:14:42,195 | ERROR | pool-43-thread-1 | OutboundQueueProviderImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | No queue present, failing request

Comment by Luis Gomez [ 05/Sep/17 ]

Here is the karaf log with INFO level:

https://logs.opendaylight.org/releng/jenkins092/openflowplugin-csit-1node-periodic-sw-scalability-daily-only-nitrogen/129/odl1_karaf.log.gz

It seems like in second iteration of 200 switches, many of the OF connection remain slave (never move to master).

Comment by Tomas Slusny [ 06/Sep/17 ]

Can you try it with this patch: https://git.opendaylight.org/gerrit/#/c/62607/8 Luis?

Comment by Luis Gomez [ 06/Sep/17 ]

Sure, I just triggered the build. It will take some time to start given existing jenkins queue.

Comment by Luis Gomez [ 07/Sep/17 ]

The result from yesterday run with the patch does not show better:

https://logs.opendaylight.org/releng/jenkins092/openflowplugin-csit-1node-periodic-sw-scalability-daily-only-nitrogen/133/odl1_karaf.log.gz

Comment by Luis Gomez [ 14/Sep/17 ]

OK, after looking at the karaf log in detail, this is why scalability test fails and controller breaks. See below scenarios:

1) In normal scenario with few switches the device is processed immediately after it connects:

2017-09-13 18:51:47,192 | INFO | entLoopGroup-7-8 | ContextChainHolderImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Device openflow:1 connected.
2017-09-13 18:51:47,192 | INFO | entLoopGroup-7-8 | ContextChainHolderImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | No context chain found for device: openflow:1, creating new.
2017-09-13 18:51:47,192 | INFO | entLoopGroup-7-8 | DeviceManagerImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | ConnectionEvent: Device connected to controller, Device:/192.168.0.24:49443, NodeId:Uri [_value=openflow:1]
2017-09-13 18:51:47,192 | INFO | entLoopGroup-7-8 | RoleContextImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Started timer for setting SLAVE role on device openflow:1 if no role will be set in 20s.

2017-09-13 18:51:47,200 | INFO | lt-dispatcher-18 | GuardedContextImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Starting DeviceContextImpl[NEW] service for node openflow:1
2017-09-13 18:51:47,202 | INFO | entLoopGroup-7-8 | DeviceInitializationUtil | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | IP address of the node openflow:1 is: IpAddress [_ipv4Address=Ipv4Address [_value=192.168.0.24]]
2017-09-13 18:51:47,202 | INFO | entLoopGroup-7-8 | DeviceInitializationUtil | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Port number of the node openflow:1 is: 49443
2017-09-13 18:51:47,208 | INFO | entLoopGroup-7-8 | OF13DeviceInitializer | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Static node openflow:1 info: OFPMPMETERFEATURES collected
2017-09-13 18:51:47,208 | INFO | entLoopGroup-7-8 | OF13DeviceInitializer | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Static node openflow:1 info: OFPMPGROUPFEATURES collected
2017-09-13 18:51:47,211 | INFO | entLoopGroup-7-8 | OF13DeviceInitializer | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Static node openflow:1 info: OFPMPPORTDESC collected
2017-09-13 18:51:47,211 | INFO | entLoopGroup-7-8 | OF13DeviceInitializer | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Static node openflow:1 successfully finished collecting
2017-09-13 18:51:47,211 | INFO | lt-dispatcher-18 | GuardedContextImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Starting RpcContextImpl[NEW] service for node openflow:1
2017-09-13 18:51:47,215 | INFO | lt-dispatcher-18 | GuardedContextImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Starting StatisticsContextImpl[NEW] service for node openflow:1
2017-09-13 18:51:47,215 | INFO | lt-dispatcher-18 | GuardedContextImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Starting RoleContextImpl[NEW] service for node openflow:1
2017-09-13 18:51:47,216 | INFO | lt-dispatcher-18 | SalRoleServiceImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | SetRole called with input:SetRoleInput [_controllerRole=BECOMEMASTER, _node=NodeRef [_value=KeyedInstanceIdentifier

{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=NodeKey [_id=Uri [_value=openflow:1]]]]}

], augmentation=[]]
2017-09-13 18:51:47,224 | INFO | lt-dispatcher-18 | SalRoleServiceImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | RoleChangeTask called on device:openflow:1 OFPRole:BECOMEMASTER
2017-09-13 18:51:47,224 | INFO | lt-dispatcher-18 | RoleService | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | getGenerationIdFromDevice called for device: openflow:1
2017-09-13 18:51:47,224 | INFO | lt-dispatcher-18 | ContextChainImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Started clustering services for node openflow:1
2017-09-13 18:51:47,225 | INFO | entLoopGroup-7-8 | RoleService | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | submitRoleChange called for device:Uri [_value=openflow:1], role:BECOMEMASTER
2017-09-13 18:51:47,226 | INFO | entLoopGroup-7-8 | RoleService | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | submitRoleChange onSuccess for device:Uri [_value=openflow:1], role:BECOMEMASTER
2017-09-13 18:51:47,226 | INFO | entLoopGroup-7-8 | FlowNodeReconciliationImpl | 268 - org.opendaylight.openflowplugin.applications.forwardingrules-manager - 0.5.0 | Triggering reconciliation for device NodeKey [_id=Uri [_value=openflow:1]]
2017-09-13 18:51:47,226 | INFO | entLoopGroup-7-8 | ContextChainHolderImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Device openflow:1 connection is enabled by reconciliation framework.
2017-09-13 18:51:47,229 | INFO | entLoopGroup-7-8 | StatisticsManagerImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Scheduling statistics poll for device: Uri [_value=openflow:1]
2017-09-13 18:51:47,229 | INFO | entLoopGroup-7-8 | ContextChainImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Device openflow:1 is able to work as master.
2017-09-13 18:51:47,229 | INFO | entLoopGroup-7-8 | DeviceManagerImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Publishing node added notification for Uri [_value=openflow:1]

2) When more switches start to connect simultaneously, controller takes more time to process the switches and after 200 switches it is very likely this time overpasses the 20 secs max time to set up slave, so in this case there is first a slave role set and then a master role set after switch is served:

2017-09-13 17:36:47,384 | INFO | entLoopGroup-7-3 | ContextChainHolderImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Device openflow:99 connected.
2017-09-13 17:36:47,384 | INFO | entLoopGroup-7-3 | ContextChainHolderImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | No context chain found for device: openflow:99, creating new.
2017-09-13 17:36:47,384 | INFO | entLoopGroup-7-3 | DeviceManagerImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | ConnectionEvent: Device connected to controller, Device:/192.168.0.24:49062, NodeId:Uri [_value=openflow:99]
2017-09-13 17:36:47,384 | INFO | entLoopGroup-7-3 | RoleContextImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Started timer for setting SLAVE role on device openflow:99 if no role will be set in 20s.

2017-09-13 17:37:07,394 | INFO | pool-40-thread-1 | SalRoleServiceImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | SetRole called with input:SetRoleInput [_controllerRole=BECOMESLAVE, _node=NodeRef [_value=KeyedInstanceIdentifier

{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=NodeKey [_id=Uri [_value=openflow:99]]]]}

], augmentation=[]]
2017-09-13 17:37:07,394 | INFO | pool-40-thread-1 | SalRoleServiceImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | RoleChangeTask called on device:openflow:99 OFPRole:BECOMESLAVE
2017-09-13 17:37:07,394 | INFO | pool-40-thread-1 | RoleService | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | getGenerationIdFromDevice called for device: openflow:99
2017-09-13 17:37:07,899 | INFO | entLoopGroup-7-3 | RoleService | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | submitRoleChange called for device:Uri [_value=openflow:99], role:BECOMESLAVE
2017-09-13 17:37:08,398 | INFO | entLoopGroup-7-3 | RoleService | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | submitRoleChange onSuccess for device:Uri [_value=openflow:99], role:BECOMESLAVE
2017-09-13 17:37:08,398 | INFO | entLoopGroup-7-3 | ReconciliationManagerImpl | 271 - org.opendaylight.openflowplugin.applications.reconciliation-framework - 0.5.0 | Stopping reconciliation for node Uri [_value=openflow:99]
2017-09-13 17:37:08,398 | INFO | entLoopGroup-7-3 | ContextChainHolderImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Role SLAVE was granted to device openflow:99

2017-09-13 17:38:44,241 | INFO | lt-dispatcher-19 | GuardedContextImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Starting DeviceContextImpl[NEW] service for node openflow:99
2017-09-13 17:38:44,250 | INFO | entLoopGroup-7-3 | DeviceInitializationUtil | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | IP address of the node openflow:99 is: IpAddress [_ipv4Address=Ipv4Address [_value=192.168.0.24]]
2017-09-13 17:38:44,250 | INFO | entLoopGroup-7-3 | DeviceInitializationUtil | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Port number of the node openflow:99 is: 49062
2017-09-13 17:38:44,263 | INFO | entLoopGroup-7-3 | OF13DeviceInitializer | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Static node openflow:99 info: OFPMPMETERFEATURES collected
2017-09-13 17:38:44,263 | INFO | entLoopGroup-7-3 | OF13DeviceInitializer | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Static node openflow:99 info: OFPMPGROUPFEATURES collected
2017-09-13 17:38:44,263 | INFO | entLoopGroup-7-3 | OF13DeviceInitializer | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Static node openflow:99 info: OFPMPPORTDESC collected
2017-09-13 17:38:44,264 | INFO | entLoopGroup-7-3 | OF13DeviceInitializer | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Static node openflow:99 successfully finished collecting
2017-09-13 17:38:44,264 | INFO | lt-dispatcher-19 | GuardedContextImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Starting RpcContextImpl[NEW] service for node openflow:99
2017-09-13 17:38:44,284 | INFO | lt-dispatcher-19 | GuardedContextImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Starting StatisticsContextImpl[NEW] service for node openflow:99
2017-09-13 17:38:44,284 | INFO | lt-dispatcher-19 | GuardedContextImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Starting RoleContextImpl[NEW] service for node openflow:99
2017-09-13 17:38:44,285 | INFO | lt-dispatcher-19 | SalRoleServiceImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | SetRole called with input:SetRoleInput [_controllerRole=BECOMEMASTER, _node=NodeRef [_value=KeyedInstanceIdentifier

{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=NodeKey [_id=Uri [_value=openflow:99]]]]}

], augmentation=[]]
2017-09-13 17:38:44,285 | INFO | lt-dispatcher-19 | SalRoleServiceImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | RoleChangeTask called on device:openflow:99 OFPRole:BECOMEMASTER
2017-09-13 17:38:44,285 | INFO | lt-dispatcher-19 | RoleService | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | getGenerationIdFromDevice called for device: openflow:99
2017-09-13 17:38:44,285 | INFO | lt-dispatcher-19 | ContextChainImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Started clustering services for node openflow:99
2017-09-13 17:38:44,292 | INFO | entLoopGroup-7-3 | RoleService | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | submitRoleChange called for device:Uri [_value=openflow:99], role:BECOMEMASTER
2017-09-13 17:38:44,297 | INFO | entLoopGroup-7-3 | RoleService | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | submitRoleChange onSuccess for device:Uri [_value=openflow:99], role:BECOMEMASTER
2017-09-13 17:38:44,381 | INFO | entLoopGroup-7-3 | FlowNodeReconciliationImpl | 268 - org.opendaylight.openflowplugin.applications.forwardingrules-manager - 0.5.0 | Triggering reconciliation for device NodeKey [_id=Uri [_value=openflow:99]]
2017-09-13 17:38:44,383 | INFO | dapter-thread-17 | ContextChainHolderImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Device openflow:99 connection is enabled by reconciliation framework.
2017-09-13 17:38:44,461 | INFO | dapter-thread-17 | StatisticsManagerImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Scheduling statistics poll for device: Uri [_value=openflow:99]
2017-09-13 17:38:44,461 | INFO | dapter-thread-17 | ContextChainImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Device openflow:99 is able to work as master.
2017-09-13 17:38:44,461 | INFO | dapter-thread-17 | DeviceManagerImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Publishing node added notification for Uri [_value=openflow:99]

3) Finally, this is the problematic scenario that happens when a switch is processed just before the 20 secs max time to set up slave, controller in this case mess ups all as it tries to setup master and slave roles simultaneously as you can see below:

2017-09-13 17:36:47,413 | INFO | entLoopGroup-7-6 | ContextChainHolderImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Device openflow:100 connected.
2017-09-13 17:36:47,413 | INFO | entLoopGroup-7-6 | ContextChainHolderImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | No context chain found for device: openflow:100, creating new.
2017-09-13 17:36:47,413 | INFO | entLoopGroup-7-6 | DeviceManagerImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | ConnectionEvent: Device connected to controller, Device:/192.168.0.24:49257, NodeId:Uri [_value=openflow:100]
2017-09-13 17:36:47,414 | INFO | entLoopGroup-7-6 | RoleContextImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Started timer for setting SLAVE role on device openflow:100 if no role will be set in 20s.

2017-09-13 17:37:07,285 | INFO | lt-dispatcher-49 | GuardedContextImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Starting DeviceContextImpl[NEW] service for node openflow:100
2017-09-13 17:37:07,294 | INFO | entLoopGroup-7-6 | DeviceInitializationUtil | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | IP address of the node openflow:100 is: IpAddress [_ipv4Address=Ipv4Address [_value=192.168.0.24]]
2017-09-13 17:37:07,294 | INFO | entLoopGroup-7-6 | DeviceInitializationUtil | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Port number of the node openflow:100 is: 49257
2017-09-13 17:37:07,304 | INFO | entLoopGroup-7-6 | OF13DeviceInitializer | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Static node openflow:100 info: OFPMPMETERFEATURES collected
2017-09-13 17:37:07,304 | INFO | entLoopGroup-7-6 | OF13DeviceInitializer | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Static node openflow:100 info: OFPMPGROUPFEATURES collected
2017-09-13 17:37:07,305 | INFO | entLoopGroup-7-6 | OF13DeviceInitializer | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Static node openflow:100 info: OFPMPPORTDESC collected
2017-09-13 17:37:07,305 | INFO | entLoopGroup-7-6 | OF13DeviceInitializer | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Static node openflow:100 successfully finished collecting
2017-09-13 17:37:07,305 | INFO | lt-dispatcher-49 | GuardedContextImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Starting RpcContextImpl[NEW] service for node openflow:100
2017-09-13 17:37:07,309 | INFO | lt-dispatcher-49 | GuardedContextImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Starting StatisticsContextImpl[NEW] service for node openflow:100
2017-09-13 17:37:07,309 | INFO | lt-dispatcher-49 | GuardedContextImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Starting RoleContextImpl[NEW] service for node openflow:100
2017-09-13 17:37:07,309 | INFO | lt-dispatcher-49 | SalRoleServiceImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | SetRole called with input:SetRoleInput [_controllerRole=BECOMEMASTER, _node=NodeRef [_value=KeyedInstanceIdentifier

{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=NodeKey [_id=Uri [_value=openflow:100]]]]}

], augmentation=[]]
2017-09-13 17:37:07,309 | INFO | lt-dispatcher-49 | SalRoleServiceImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | RoleChangeTask called on device:openflow:100 OFPRole:BECOMEMASTER
2017-09-13 17:37:07,309 | INFO | lt-dispatcher-49 | RoleService | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | getGenerationIdFromDevice called for device: openflow:100
2017-09-13 17:37:07,309 | INFO | lt-dispatcher-49 | ContextChainImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Started clustering services for node openflow:100
2017-09-13 17:37:07,349 | INFO | entLoopGroup-7-6 | RoleService | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | submitRoleChange called for device:Uri [_value=openflow:100], role:BECOMEMASTER

2017-09-13 17:37:07,422 | INFO | pool-40-thread-1 | SalRoleServiceImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | SetRole called with input:SetRoleInput [_controllerRole=BECOMESLAVE, _node=NodeRef [_value=KeyedInstanceIdentifier

{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=NodeKey [_id=Uri [_value=openflow:100]]]]}

], augmentation=[]]
2017-09-13 17:37:07,422 | INFO | pool-40-thread-1 | SalRoleServiceImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | RoleChangeTask called on device:openflow:100 OFPRole:BECOMESLAVE
2017-09-13 17:37:07,422 | INFO | pool-40-thread-1 | RoleService | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | getGenerationIdFromDevice called for device: openflow:100
2017-09-13 17:37:07,422 | WARN | pool-40-thread-1 | ContextChainHolderImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Not able to set MASTER role on device openflow:100, reason: Was not able to propagate MASTER role on device. Error: java.util.concurrent.CancellationException: Task was cancelled.
2017-09-13 17:37:07,422 | WARN | pool-40-thread-1 | ContextChainHolderImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | This mastering is mandatory, destroying context chain and closing connection for device openflow:100.
2017-09-13 17:37:07,422 | INFO | pool-40-thread-1 | ReconciliationManagerImpl | 271 - org.opendaylight.openflowplugin.applications.reconciliation-framework - 0.5.0 | Stopping reconciliation for node Uri [_value=openflow:100]
2017-09-13 17:37:07,423 | INFO | pool-40-thread-1 | ContextChainImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Closed clustering services registration for node openflow:100
2017-09-13 17:37:07,423 | INFO | pool-40-thread-1 | GuardedContextImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Terminating DeviceContextImpl[RUNNING] service for node openflow:100
2017-09-13 17:37:07,423 | INFO | pool-40-thread-1 | GuardedContextImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Terminating RpcContextImpl[RUNNING] service for node openflow:100
2017-09-13 17:37:07,446 | INFO | pool-40-thread-1 | GuardedContextImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Terminating StatisticsContextImpl[RUNNING] service for node openflow:100
2017-09-13 17:37:07,446 | INFO | pool-40-thread-1 | StatisticsContextImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Stopping running statistics gathering for node openflow:100
2017-09-13 17:37:07,446 | WARN | pool-40-thread-1 | ContextChainHolderImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Not able to set MASTER role on device openflow:100, reason: Initial gathering statistics unsuccessful.
2017-09-13 17:37:07,446 | WARN | pool-40-thread-1 | ContextChainHolderImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | This mastering is mandatory, destroying context chain and closing connection for device openflow:100.
2017-09-13 17:37:07,446 | INFO | pool-40-thread-1 | ReconciliationManagerImpl | 271 - org.opendaylight.openflowplugin.applications.reconciliation-framework - 0.5.0 | Stopping reconciliation for node Uri [_value=openflow:100]
2017-09-13 17:37:07,447 | INFO | pool-40-thread-1 | GuardedContextImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Terminating RoleContextImpl[RUNNING] service for node openflow:100
2017-09-13 17:37:07,451 | WARN | pool-40-thread-1 | ContextChainHolderImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | Not able to set SLAVE role on device openflow:100, reason: Was not able to propagate SLAVE role on device. Error: org.opendaylight.openflowplugin.impl.role.RoleChangeException: Connection closed.
2017-09-13 17:37:07,451 | INFO | pool-40-thread-1 | ReconciliationManagerImpl | 271 - org.opendaylight.openflowplugin.applications.reconciliation-framework - 0.5.0 | Stopping reconciliation for node Uri [_value=openflow:100]
2017-09-13 17:37:07,927 | INFO | entLoopGroup-7-6 | RoleService | 278 - org.opendaylight.openflowplugin.impl - 0.5.0 | submitRoleChange onSuccess for device:Uri [_value=openflow:100], role:BECOMEMASTER

Since this can happen with relatively few switches connecting at the same time and the impact is really bad (only solution is restart controller), I am raising this issue as BLOCKER for nitrogen.

Comment by Tomas Slusny [ 14/Sep/17 ]

Can you test it with this patch Luis? https://git.opendaylight.org/gerrit/#/c/63123/

Comment by Luis Gomez [ 14/Sep/17 ]

Result is bit better but I still see the issue:

https://logs.opendaylight.org/releng/jenkins092/openflowplugin-csit-1node-periodic-sw-scalability-daily-only-nitrogen/147/odl1_karaf.log.gz

— Device disconnects because load —

2017-09-14 18:00:19,597 | INFO | entLoopGroup-7-1 | SystemNotificationsListenerImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | ConnectionEvent: Connection closed by device, Device:/10.29.13.226:53000, NodeId:openflow:109
2017-09-14 18:00:19,597 | INFO | entLoopGroup-7-1 | ContextChainHolderImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Device openflow:109 disconnected.
2017-09-14 18:00:19,597 | INFO | entLoopGroup-7-1 | ReconciliationManagerImpl | 271 - org.opendaylight.openflowplugin.applications.reconciliation-framework - 0.5.0.SNAPSHOT | Stopping reconciliation for node Uri [_value=openflow:109]
2017-09-14 18:00:19,597 | INFO | entLoopGroup-7-1 | DeviceManagerImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Publishing node removed notification for Uri [_value=openflow:109]
2017-09-14 18:00:19,597 | INFO | entLoopGroup-7-1 | ContextChainImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Closed clustering services registration for node openflow:109
2017-09-14 18:00:19,597 | INFO | entLoopGroup-7-1 | GuardedContextImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Terminating DeviceContextImpl[RUNNING] service for node openflow:109
2017-09-14 18:00:19,597 | INFO | entLoopGroup-7-1 | GuardedContextImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Terminating RpcContextImpl[RUNNING] service for node openflow:109
2017-09-14 18:00:19,598 | INFO | entLoopGroup-7-1 | GuardedContextImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Terminating StatisticsContextImpl[RUNNING] service for node openflow:109
2017-09-14 18:00:19,598 | INFO | entLoopGroup-7-1 | StatisticsContextImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Stopping running statistics gathering for node openflow:109
2017-09-14 18:00:19,598 | INFO | entLoopGroup-7-1 | StatisticsManagerImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Statistics gathering for device openflow:109 was cancelled.
2017-09-14 18:00:19,598 | INFO | entLoopGroup-7-1 | GuardedContextImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Terminating RoleContextImpl[RUNNING] service for node openflow:109
2017-09-14 18:00:19,598 | INFO | pool-41-thread-1 | DeviceMastershipManager | 268 - org.opendaylight.openflowplugin.applications.forwardingrules-manager - 0.5.0.SNAPSHOT | Unregistered FRM cluster singleton service for service id : openflow:109
2017-09-14 18:00:21,381 | INFO | lt-dispatcher-21 | ReconciliationManagerImpl | 271 - org.opendaylight.openflowplugin.applications.reconciliation-framework - 0.5.0.SNAPSHOT | Stopping reconciliation for node Uri [_value=openflow:109]
2017-09-14 18:00:21,381 | INFO | lt-dispatcher-21 | ContextChainHolderImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Role SLAVE was granted to device openflow:109
2017-09-14 18:00:21,381 | INFO | ofppool-31 | ContextChainImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Closed clustering services for node openflow:109
2017-09-14 18:00:21,391 | INFO | lt-dispatcher-24 | ContextChainHolderImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Try to remove device Uri [_value=openflow:109] from operational DS
2017-09-14 18:00:21,391 | INFO | lt-dispatcher-24 | ContextChainHolderImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Removing device from operational DS Uri [_value=openflow:109] was successful

— Device Reconnects —

2017-09-14 18:00:53,573 | INFO | entLoopGroup-7-8 | ContextChainHolderImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Device openflow:109 connected.
2017-09-14 18:00:53,573 | INFO | entLoopGroup-7-8 | ContextChainHolderImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | No context chain found for device: openflow:109, creating new.
2017-09-14 18:00:53,573 | INFO | entLoopGroup-7-8 | DeviceManagerImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | ConnectionEvent: Device connected to controller, Device:/10.29.13.226:53478, NodeId:Uri [_value=openflow:109]
2017-09-14 18:00:53,573 | INFO | entLoopGroup-7-8 | RoleContextImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Started timer for setting SLAVE role on device openflow:109 if no role will be set in 20s.

— Master process starts —

2017-09-14 18:01:12,485 | INFO | lt-dispatcher-24 | GuardedContextImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Starting DeviceContextImpl[NEW] service for node openflow:109
2017-09-14 18:01:12,589 | INFO | entLoopGroup-7-8 | DeviceInitializationUtil | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | IP address of the node openflow:109 is: IpAddress [_ipv4Address=Ipv4Address [_value=10.29.13.226]]
2017-09-14 18:01:12,589 | INFO | entLoopGroup-7-8 | DeviceInitializationUtil | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Port number of the node openflow:109 is: 53478

— 20 secs slave role kicks in the middle and problem happens —

2017-09-14 18:01:13,581 | INFO | pool-44-thread-1 | SalRoleServiceImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | SetRole called with input:SetRoleInput [_controllerRole=BECOMESLAVE, _node=NodeRef [_value=KeyedInstanceIdentifier

{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=NodeKey [_id=Uri [_value=openflow:109]]]]}

], augmentation=[]]
2017-09-14 18:01:13,581 | INFO | pool-44-thread-1 | SalRoleServiceImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | RoleChangeTask called on device:openflow:109 OFPRole:BECOMESLAVE
2017-09-14 18:01:13,581 | INFO | pool-44-thread-1 | RoleService | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | getGenerationIdFromDevice called for device: openflow:109
2017-09-14 18:01:13,655 | INFO | entLoopGroup-7-8 | OF13DeviceInitializer | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Static node openflow:109 info: OFPMPMETERFEATURES collected
2017-09-14 18:01:13,655 | INFO | entLoopGroup-7-8 | OF13DeviceInitializer | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Static node openflow:109 info: OFPMPGROUPFEATURES collected
2017-09-14 18:01:13,655 | INFO | entLoopGroup-7-8 | OF13DeviceInitializer | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Static node openflow:109 info: OFPMPPORTDESC collected
2017-09-14 18:01:13,656 | INFO | entLoopGroup-7-8 | OF13DeviceInitializer | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Static node openflow:109 successfully finished collecting
2017-09-14 18:01:13,656 | INFO | entLoopGroup-7-8 | RoleService | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | submitRoleChange called for device:Uri [_value=openflow:109], role:BECOMESLAVE
2017-09-14 18:01:13,658 | INFO | lt-dispatcher-24 | GuardedContextImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Starting RpcContextImpl[NEW] service for node openflow:109
2017-09-14 18:01:13,671 | INFO | lt-dispatcher-24 | GuardedContextImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Starting StatisticsContextImpl[NEW] service for node openflow:109
2017-09-14 18:01:13,671 | INFO | lt-dispatcher-24 | GuardedContextImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Starting RoleContextImpl[NEW] service for node openflow:109
2017-09-14 18:01:13,671 | INFO | lt-dispatcher-24 | SalRoleServiceImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | SetRole called with input:SetRoleInput [_controllerRole=BECOMEMASTER, _node=NodeRef [_value=KeyedInstanceIdentifier

{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=NodeKey [_id=Uri [_value=openflow:109]]]]}

], augmentation=[]]
2017-09-14 18:01:13,671 | INFO | lt-dispatcher-24 | SalRoleServiceImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | RoleChangeTask called on device:openflow:109 OFPRole:BECOMEMASTER
2017-09-14 18:01:13,671 | INFO | lt-dispatcher-24 | RoleService | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | getGenerationIdFromDevice called for device: openflow:109
2017-09-14 18:01:13,671 | WARN | lt-dispatcher-24 | ContextChainHolderImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Not able to set SLAVE role on device openflow:109, reason: Was not able to propagate SLAVE role on device. Error: java.util.concurrent.CancellationException: Task was cancelled.
2017-09-14 18:01:13,672 | INFO | lt-dispatcher-24 | ReconciliationManagerImpl | 271 - org.opendaylight.openflowplugin.applications.reconciliation-framework - 0.5.0.SNAPSHOT | Stopping reconciliation for node Uri [_value=openflow:109]
2017-09-14 18:01:13,672 | INFO | lt-dispatcher-24 | ContextChainImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Closed clustering services registration for node openflow:109
2017-09-14 18:01:13,672 | INFO | lt-dispatcher-24 | GuardedContextImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Terminating DeviceContextImpl[RUNNING] service for node openflow:109
2017-09-14 18:01:13,673 | INFO | lt-dispatcher-24 | GuardedContextImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Terminating RpcContextImpl[RUNNING] service for node openflow:109
2017-09-14 18:01:13,673 | INFO | lt-dispatcher-24 | GuardedContextImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Terminating StatisticsContextImpl[RUNNING] service for node openflow:109
2017-09-14 18:01:13,673 | INFO | lt-dispatcher-24 | StatisticsContextImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Stopping running statistics gathering for node openflow:109
2017-09-14 18:01:13,673 | WARN | lt-dispatcher-24 | ContextChainHolderImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Not able to set MASTER role on device openflow:109, reason: Initial gathering statistics unsuccessful.
2017-09-14 18:01:13,674 | WARN | lt-dispatcher-24 | ContextChainHolderImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | This mastering is mandatory, destroying context chain and closing connection for device openflow:109.
2017-09-14 18:01:13,674 | INFO | lt-dispatcher-24 | ReconciliationManagerImpl | 271 - org.opendaylight.openflowplugin.applications.reconciliation-framework - 0.5.0.SNAPSHOT | Stopping reconciliation for node Uri [_value=openflow:109]
2017-09-14 18:01:13,674 | INFO | lt-dispatcher-24 | GuardedContextImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Terminating RoleContextImpl[STARTING] service for node openflow:109
2017-09-14 18:01:13,674 | INFO | lt-dispatcher-24 | RoleService | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | submitRoleChange onSuccess for device:Uri [_value=openflow:109], role:BECOMESLAVE
2017-09-14 18:01:13,674 | WARN | lt-dispatcher-24 | RoleService | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | RpcError on submitRoleChange for Uri [_value=openflow:109]: RpcError [message=Connection closed., severity=ERROR, errorType=APPLICATION, tag=, applicationTag=null, info=null, cause=null]
2017-09-14 18:01:13,676 | INFO | ult-dispatcher-4 | ContextChainHolderImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Try to remove device Uri [_value=openflow:109] from operational DS
2017-09-14 18:01:13,683 | INFO | ult-dispatcher-4 | ContextChainHolderImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Removing device from operational DS Uri [_value=openflow:109] was successful
2017-09-14 18:01:13,687 | INFO | entLoopGroup-7-8 | RoleService | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | submitRoleChange called for device:Uri [_value=openflow:109], role:BECOMEMASTER
2017-09-14 18:01:13,745 | INFO | lt-dispatcher-24 | ContextChainImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Started clustering services for node openflow:109
2017-09-14 18:01:14,569 | INFO | entLoopGroup-7-8 | RoleService | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | submitRoleChange onSuccess for device:Uri [_value=openflow:109], role:BECOMEMASTER
2017-09-14 18:03:06,220 | INFO | lt-dispatcher-25 | ReconciliationManagerImpl | 271 - org.opendaylight.openflowplugin.applications.reconciliation-framework - 0.5.0.SNAPSHOT | Stopping reconciliation for node Uri [_value=openflow:109]
2017-09-14 18:03:06,220 | INFO | lt-dispatcher-25 | ContextChainHolderImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Role SLAVE was granted to device openflow:109
2017-09-14 18:03:06,220 | INFO | lt-dispatcher-25 | GuardedContextImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Stopping RoleContextImpl[RUNNING] service for node openflow:109
2017-09-14 18:03:06,220 | INFO | ofppool-40 | ContextChainImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Closed clustering services for node openflow:109
2017-09-14 18:15:29,384 | INFO | entLoopGroup-7-8 | SystemNotificationsListenerImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | ConnectionEvent: Connection closed by device, Device:/10.29.13.226:53478, NodeId:openflow:109
2017-09-14 18:15:29,384 | INFO | entLoopGroup-7-8 | ContextChainHolderImpl | 278 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Device openflow:109 disconnected.
2017-09-14 18:15:29,384 | INFO | entLoopGroup-7-8 | ReconciliationManagerImpl | 271 - org.opendaylight.openflowplugin.applications.reconciliation-framework - 0.5.0.SNAPSHOT | Stopping reconciliation for node Uri [_value=openflow:109]

Comment by Tomas Slusny [ 18/Sep/17 ]

Thank you for logs, it helped a lot. The failure case was because SLAVE role future was cancelled because we started sending MASTER role, and we sent slave role not aquired notification to mastership service, so I added check for CancellationException so this will not happend anymore. Updated both patches (master, stable/nitrogen), so can you re-test it Louis?

Comment by Luis Gomez [ 18/Sep/17 ]

Good news last patchset:

https://git.opendaylight.org/gerrit/#/c/63123/

gets scalability as should be:

https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-periodic-sw-scalability-daily-only-nitrogen/plot/Switch%20Scalability/

Comment by Luis Gomez [ 18/Sep/17 ]

Nitrogen patch: https://git.opendaylight.org/gerrit/#/c/63146

Comment by Tomas Slusny [ 19/Sep/17 ]

Carbon patch: https://git.opendaylight.org/gerrit/#/c/63275/
It is a bit different because there is no RoleContext that centralizes role management in Carbon, what complicates things.

Comment by Luis Gomez [ 20/Sep/17 ]

From your comment: what do we want to do here? are we not going to cherry-pick nitrogen cluster fixes to carbon?

Comment by Luis Gomez [ 20/Sep/17 ]

Also the carbon patch does not work:

https://logs.opendaylight.org/releng/jenkins092/openflowplugin-csit-1node-periodic-sw-scalability-daily-only-carbon/409/odl1_karaf.log.gz

From the log we see connection ongoing but actually nothing shows in operational for switch openflow:146

2017-09-20 20:17:15,317 | INFO | entLoopGroup-7-2 | ContextChainHolderImpl | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | Device openflow:146 connected.
2017-09-20 20:17:15,317 | INFO | entLoopGroup-7-2 | DeviceManagerImpl | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | ConnectionEvent: Device connected to controller, Device:/10.29.12.83:54256, NodeId:Uri [_value=openflow:146]
2017-09-20 20:17:15,317 | INFO | entLoopGroup-7-2 | ContextChainHolderImpl | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | Started timer for setting SLAVE role on node openflow:146 if no role will be set in 20s.
2017-09-20 20:17:15,317 | INFO | entLoopGroup-7-2 | ContextChainImpl | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | Registering clustering services for node openflow:146
2017-09-20 20:17:15,317 | INFO | entLoopGroup-7-2 | ContextChainImpl | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | Registered clustering services for node openflow:146
2017-09-20 20:17:35,196 | INFO | ult-dispatcher-2 | ContextChainImpl | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | Starting clustering services for node openflow:146
2017-09-20 20:17:35,196 | INFO | ult-dispatcher-2 | DeviceContextImpl | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | Starting device context cluster services for node openflow:146
2017-09-20 20:17:35,276 | INFO | pool-31-thread-1 | SalRoleServiceImpl | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | SetRole called with input:SetRoleInput [_controllerRole=BECOMESLAVE, _node=NodeRef [_value=KeyedInstanceIdentifier

{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=NodeKey [_id=Uri [_value=openflow:146]]]]}

], augmentation=[]]
2017-09-20 20:17:35,276 | INFO | pool-31-thread-1 | SalRoleServiceImpl | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | RoleChangeTask called on device:openflow:146 OFPRole:BECOMESLAVE
2017-09-20 20:17:35,276 | INFO | pool-31-thread-1 | RoleService | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | getGenerationIdFromDevice called for device: openflow:146
2017-09-20 20:17:35,281 | INFO | entLoopGroup-7-2 | DeviceInitializationUtil | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | IP address of the node openflow:146 is: IpAddress [_ipv4Address=Ipv4Address [_value=10.29.12.83]]
2017-09-20 20:17:35,281 | INFO | entLoopGroup-7-2 | DeviceInitializationUtil | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | Port number of the node openflow:146 is: 54256
2017-09-20 20:17:35,316 | INFO | entLoopGroup-7-2 | RoleService | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | submitRoleChange called for device:Uri [_value=openflow:146], role:BECOMESLAVE
2017-09-20 20:17:35,316 | INFO | entLoopGroup-7-2 | OF13DeviceInitializer | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | Static node openflow:146 info: OFPMPMETERFEATURES collected
2017-09-20 20:17:35,316 | INFO | entLoopGroup-7-2 | OF13DeviceInitializer | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | Static node openflow:146 info: OFPMPGROUPFEATURES collected
2017-09-20 20:17:35,316 | INFO | entLoopGroup-7-2 | OF13DeviceInitializer | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | Static node openflow:146 info: OFPMPPORTDESC collected
2017-09-20 20:17:35,316 | INFO | entLoopGroup-7-2 | OF13DeviceInitializer | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | Static node openflow:146 successfully finished collecting
2017-09-20 20:17:35,322 | INFO | ult-dispatcher-2 | SalRoleServiceImpl | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | SetRole called with input:SetRoleInput [_controllerRole=BECOMEMASTER, _node=NodeRef [_value=KeyedInstanceIdentifier

{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=NodeKey [_id=Uri [_value=openflow:146]]]]}

], augmentation=[]]
2017-09-20 20:17:35,322 | INFO | ult-dispatcher-2 | SalRoleServiceImpl | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | RoleChangeTask called on device:openflow:146 OFPRole:BECOMEMASTER
2017-09-20 20:17:35,322 | INFO | ult-dispatcher-2 | RoleService | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | getGenerationIdFromDevice called for device: openflow:146
2017-09-20 20:17:35,323 | ERROR | ult-dispatcher-2 | SalRoleServiceImpl | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | SetRoleService set Role BECOMESLAVE for Node: KeyedInstanceIdentifier

{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=NodeKey [_id=Uri [_value=openflow:146]]]]}

fail . Reason java.util.concurrent.CancellationException: Task was cancelled.
2017-09-20 20:17:35,323 | INFO | ult-dispatcher-2 | RpcContextImpl | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | Starting rpc context cluster services for node openflow:146
2017-09-20 20:17:35,332 | INFO | entLoopGroup-7-2 | RoleService | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | submitRoleChange onSuccess for device:Uri [_value=openflow:146], role:BECOMESLAVE
2017-09-20 20:17:35,351 | INFO | entLoopGroup-7-2 | RoleService | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | submitRoleChange called for device:Uri [_value=openflow:146], role:BECOMEMASTER
2017-09-20 20:17:35,374 | INFO | entLoopGroup-7-2 | RoleService | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | submitRoleChange onSuccess for device:Uri [_value=openflow:146], role:BECOMEMASTER
2017-09-20 20:17:35,387 | INFO | ult-dispatcher-2 | StatisticsContextImpl | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | Starting statistics context cluster services for node openflow:146
2017-09-20 20:17:35,387 | INFO | ult-dispatcher-2 | ContextChainImpl | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | Started clustering services for node openflow:146
2017-09-20 20:17:35,487 | INFO | entLoopGroup-7-2 | StatisticsManagerImpl | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | Scheduling statistics poll for device: Uri [_value=openflow:146]

Comment by Tomas Slusny [ 21/Sep/17 ]

Hmm that is weird, logs looks fine. I noticed some weird thing (that slave task did not got fully cancelled for some reason) so I uploaded new patch.

Comment by Luis Gomez [ 22/Sep/17 ]

Same issue, still mess up roles in carbon:

https://logs.opendaylight.org/releng/jenkins092/openflowplugin-csit-1node-periodic-sw-scalability-daily-only-carbon/411/odl1_karaf.log.gz

Master and Slave at the same time and scale test fails on this switch:

2017-09-21 20:59:10,486 | INFO | ult-dispatcher-4 | ContextChainImpl | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | Starting clustering services for node openflow:73
2017-09-21 20:59:10,486 | INFO | ult-dispatcher-4 | DeviceContextImpl | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | Starting device context cluster services for node openflow:73
2017-09-21 20:59:10,634 | INFO | pool-28-thread-1 | SalRoleServiceImpl | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | SetRole called with input:SetRoleInput [_controllerRole=BECOMESLAVE, _node=NodeRef [_value=KeyedInstanceIdentifier

{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=NodeKey [_id=Uri [_value=openflow:73]]]]}

], augmentation=[]]
2017-09-21 20:59:10,634 | INFO | pool-28-thread-1 | SalRoleServiceImpl | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | RoleChangeTask called on device:openflow:73 OFPRole:BECOMESLAVE
2017-09-21 20:59:10,634 | INFO | pool-28-thread-1 | RoleService | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | getGenerationIdFromDevice called for device: openflow:73
2017-09-21 20:59:10,662 | INFO | entLoopGroup-7-3 | DeviceInitializationUtil | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | IP address of the node openflow:73 is: IpAddress [_ipv4Address=Ipv4Address [_value=10.29.15.165]]
2017-09-21 20:59:10,662 | INFO | entLoopGroup-7-3 | DeviceInitializationUtil | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | Port number of the node openflow:73 is: 36476
2017-09-21 20:59:10,726 | INFO | entLoopGroup-7-3 | RoleService | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | submitRoleChange called for device:Uri [_value=openflow:73], role:BECOMESLAVE
2017-09-21 20:59:10,726 | INFO | entLoopGroup-7-3 | OF13DeviceInitializer | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | Static node openflow:73 info: OFPMPMETERFEATURES collected
2017-09-21 20:59:10,726 | INFO | entLoopGroup-7-3 | OF13DeviceInitializer | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | Static node openflow:73 info: OFPMPGROUPFEATURES collected
2017-09-21 20:59:10,726 | INFO | entLoopGroup-7-3 | OF13DeviceInitializer | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | Static node openflow:73 info: OFPMPPORTDESC collected
2017-09-21 20:59:10,726 | INFO | entLoopGroup-7-3 | OF13DeviceInitializer | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | Static node openflow:73 successfully finished collecting
2017-09-21 20:59:10,729 | INFO | ult-dispatcher-4 | SalRoleServiceImpl | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | SetRole called with input:SetRoleInput [_controllerRole=BECOMEMASTER, _node=NodeRef [_value=KeyedInstanceIdentifier

{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=NodeKey [_id=Uri [_value=openflow:73]]]]}

], augmentation=[]]
2017-09-21 20:59:10,729 | INFO | ult-dispatcher-4 | SalRoleServiceImpl | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | RoleChangeTask called on device:openflow:73 OFPRole:BECOMEMASTER
2017-09-21 20:59:10,729 | INFO | ult-dispatcher-4 | RoleService | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | getGenerationIdFromDevice called for device: openflow:73
2017-09-21 20:59:10,729 | INFO | ult-dispatcher-4 | RpcContextImpl | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | Starting rpc context cluster services for node openflow:73
2017-09-21 20:59:10,735 | INFO | ult-dispatcher-4 | StatisticsContextImpl | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | Starting statistics context cluster services for node openflow:73
2017-09-21 20:59:10,736 | INFO | ult-dispatcher-4 | ContextChainImpl | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | Started clustering services for node openflow:73
2017-09-21 20:59:10,853 | INFO | entLoopGroup-7-3 | RoleService | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | submitRoleChange onSuccess for device:Uri [_value=openflow:73], role:BECOMESLAVE
2017-09-21 20:59:10,853 | INFO | entLoopGroup-7-3 | RoleService | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | submitRoleChange called for device:Uri [_value=openflow:73], role:BECOMEMASTER
2017-09-21 20:59:10,861 | INFO | entLoopGroup-7-3 | RoleService | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | submitRoleChange onSuccess for device:Uri [_value=openflow:73], role:BECOMEMASTER
2017-09-21 20:59:10,926 | INFO | entLoopGroup-7-3 | StatisticsManagerImpl | 193 - org.opendaylight.openflowplugin.impl - 0.4.2.SNAPSHOT | Scheduling statistics poll for device: Uri [_value=openflow:73]

Comment by Luis Gomez [ 22/Sep/17 ]

Change title to reflect only carbon has the issue now.

Comment by Tomas Slusny [ 26/Sep/17 ]

Alright, I tried to reproduce what was changed in role context in nitrogen and master, can you try this issue with this patch: https://git.opendaylight.org/gerrit/#/c/63536/ luis?

Comment by Luis Gomez [ 27/Sep/17 ]

OK, the last patch seems to work in carbon but now we have new issue preventing switch scale test to work properly:

https://bugs.opendaylight.org/show_bug.cgi?id=9217

Comment by Tomas Slusny [ 27/Sep/17 ]

Alright the patch on carbon got merged, so should we wait until the new bug is closed before we close this one?

Comment by Luis Gomez [ 27/Sep/17 ]

No thats fine, I already tested it.

Generated at Wed Feb 07 20:33:47 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.