[OPNFLWPLUG-826] ovs recovery failed Created: 22/Nov/16  Updated: 27/Sep/21  Resolved: 02/Mar/17

Status: Resolved
Project: OpenFlowPlugin
Component/s: General
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: zan cohen Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: File file.pcap     File karaf.log.gz     Zip Archive karaf.zip    
External issue ID: 7223

 Description   

Description
************
1.Load Newton setup(Devstuck) :
Host1=controller+compute
Host2=compute
2.Lunch vm on Host 2.
3.Accosiate Fip to vm
4.connect to vm via ssh - o.k
5.send ping from vm to 8.8.8.8 -o.k

*Dump pipeline of compute which vm create on.
*Restart ovs
*Try to send ping and ssh to vm - fail.
*Dump pipeline again

resault
********
It can be seen that pipeline is empty from rules.

Note
****
Restart ovs of Host 1 and dump pipeline - pipeline is empty as well.



 Comments   
Comment by zan cohen [ 22/Nov/16 ]

Logs:
\\10.0.1.216\pub\zan_cohen\Bugzila_7223

Comment by zan cohen [ 22/Nov/16 ]

controller restart doesn't solve the problem as work around!!

Comment by zan cohen [ 23/Nov/16 ]

Defect doesn't reconstruct permanently.

Comment by Olga Schukin [ 12/Dec/16 ]

Failed to establish connection after OVS restart.

After initial logs analysis, it seems the role setting flow didn't finish successfully, related thread didn't return from the callback function getGenerationIdFromDevice()

                              • From karaf log: ********************

2016-11-22 16:05:20,135 | INFO | entLoopGroup-9-3 | SalRoleServiceImpl | 268 - org.opendaylight.openflowplugin.impl - 0.3.2.SNAPSHOT | SetRole called with input:SetRoleInput [_controllerRole=BECOMESLAVE, _node=NodeRef [_value=KeyedInstanceIdentifier

{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=NodeKey [_id=Uri [_value=openflow:75770075299928]]]]}

], augmentation=[]]
2016-11-22 16:05:20,135 | INFO | entLoopGroup-9-3 | SalRoleServiceImpl | 268 - org.opendaylight.openflowplugin.impl - 0.3.2.SNAPSHOT | Requesting state change to BECOMESLAVE
2016-11-22 16:05:20,135 | INFO | entLoopGroup-9-3 | SalRoleServiceImpl | 268 - org.opendaylight.openflowplugin.impl - 0.3.2.SNAPSHOT | RoleChangeTask called on device:openflow:75770075299928 OFPRole:BECOMESLAVE
2016-11-22 16:05:20,135 | INFO | entLoopGroup-9-3 | RoleService | 268 - org.opendaylight.openflowplugin.impl - 0.3.2.SNAPSHOT | getGenerationIdFromDevice called for device: openflow:75770075299928
2016-11-22 16:05:20,140 | INFO | entLoopGroup-9-3 | LifecycleServiceImpl | 268 - org.opendaylight.openflowplugin.impl - 0.3.2.SNAPSHOT | Starting clustering MASTER services for node openflow:75770075299928
2016-11-22 16:05:20,140 | INFO | entLoopGroup-9-3 | DeviceContextImpl | 268 - org.opendaylight.openflowplugin.impl - 0.3.2.SNAPSHOT | Starting device context cluster services for node openflow:75770075299928

2016-11-22 16:05:30,139 | WARN | pool-41-thread-1 | DeviceContextImpl | 268 - org.opendaylight.openflowplugin.impl - 0.3.2.SNAPSHOT | New role BECOMESLAVE was not propagated to device openflow:75770075299928 during 10 sec
2016-11-22 16:05:30,141 | ERROR | pool-41-thread-1 | SalRoleServiceImpl | 268 - org.opendaylight.openflowplugin.impl - 0.3.2.SNAPSHOT | SetRoleService set Role BECOMESLAVE for Node: KeyedInstanceIdentifier

{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=NodeKey [_id=Uri [_value=openflow:75770075299928]]]]}

fail . Reason java.util.concurrent.CancellationException: Task was cancelled.
2016-11-22 16:05:30,141 | WARN | pool-41-thread-1 | DeviceManagerImpl | 268 - org.opendaylight.openflowplugin.impl - 0.3.2.SNAPSHOT | Was not able to set role SLAVE to device on node openflow:75770075299928
**************************************************************

The following message appears each 10 sec "New role BECOMESLAVE was not propagated to device openflow:75770075299928 during 10 sec"

Comment by Koby Aizer [ 23/Feb/17 ]

I seemed to hit the same issue as well. Using a single node setup, I just restarted ODL (/bin/stop and then /bin/start), and the OVS couldn't connect the the ODL again.

I saw reccuring prints of:

2017-02-23 18:09:04,519 | INFO | ntLoopGroup-9-93 | ConnectionAdapterImpl | 306 - org.opendaylight.openflowjava.openflow-protocol-impl - 0.8.2.SNAPSHOT | Hello received / branch
2017-02-23 18:09:04,521 | INFO | ntLoopGroup-9-93 | DeviceManagerImpl | 311 - org.opendaylight.openflowplugin.impl - 0.3.2.SNAPSHOT | ConnectionEvent: Device connected to controller, Device:/10.0.118.2:57396, NodeId:Uri [_va
lue=openflow:198223095212864]
2017-02-23 18:09:04,522 | INFO | ntLoopGroup-9-93 | SalRoleServiceImpl | 311 - org.opendaylight.openflowplugin.impl - 0.3.2.SNAPSHOT | SetRole called with input:SetRoleInput [_controllerRole=BECOMESLAVE, _node=NodeRef [_value
=KeyedInstanceIdentifier

{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.open daylight.inventory.rev130819.nodes.Node[key=NodeKey [_id=Uri [_value=openflow:198223095212864]]]]}

], augmentation=[]]
2017-02-23 18:09:04,522 | INFO | ntLoopGroup-9-93 | SalRoleServiceImpl | 311 - org.opendaylight.openflowplugin.impl - 0.3.2.SNAPSHOT | Requesting state change to BECOMESLAVE
2017-02-23 18:09:04,522 | INFO | ntLoopGroup-9-93 | SalRoleServiceImpl | 311 - org.opendaylight.openflowplugin.impl - 0.3.2.SNAPSHOT | RoleChangeTask called on device:openflow:198223095212864 OFPRole:BECOMESLAVE
2017-02-23 18:09:04,522 | INFO | ntLoopGroup-9-93 | RoleService | 311 - org.opendaylight.openflowplugin.impl - 0.3.2.SNAPSHOT | getGenerationIdFromDevice called for device: openflow:198223095212864
2017-02-23 18:09:04,522 | WARN | ntLoopGroup-9-93 | LifecycleServiceImpl | 311 - org.opendaylight.openflowplugin.impl - 0.3.2.SNAPSHOT | Failed to register cluster singleton service for node org.opendaylight.openflowplugin.impl
.connection.ConnectionContextImpl$DeviceInfoImpl@4f7874de, with exception: java.lang.RuntimeException: org.opendaylight.mdsal.eos.common.api.CandidateAlreadyRegisteredException: Candidate has already been registered for DOMEntity [type=org.
opendaylight.mdsal.ServiceEntityType, id=/(urn:opendaylight:params:xml:ns:yang:mdsal:core:general-entity?revision=2015-09-30)entity/entity[

{(urn:opendaylight:params:xml:ns:yang:mdsal:core:general-entity?revision=2015-09-30)name=openflow:198 223095212864}

]]
2017-02-23 18:09:06,531 | WARN | pool-40-thread-1 | DeviceContextImpl | 311 - org.opendaylight.openflowplugin.impl - 0.3.2.SNAPSHOT | New role BECOMESLAVE was not propagated to device openflow:198223095212864 during 10 sec
2017-02-23 18:09:06,531 | ERROR | pool-40-thread-1 | SalRoleServiceImpl | 311 - org.opendaylight.openflowplugin.impl - 0.3.2.SNAPSHOT | SetRoleService set Role BECOMESLAVE for Node: KeyedInstanceIdentifier

{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=N odeKey [_id=Uri [_value=openflow:198223095212864]]]]}

fail . Reason java.util.concurrent.CancellationException: Task was cancelled.
2017-02-23 18:09:06,531 | WARN | pool-40-thread-1 | DeviceManagerImpl | 311 - org.opendaylight.openflowplugin.impl - 0.3.2.SNAPSHOT | Was not able to set role SLAVE to device on node openflow:198223095212864
2017-02-23 18:09:08,386 | WARN | ssionScavenger-4 | teInvalidatingHashSessionManager | 53 - org.ops4j.pax.web.pax-web-jetty - 3.2.9 | Timing out for 1 session(s) with id jxdj4yx1r8jexufsv20palhi
2017-02-23 18:09:12,520 | INFO | ntLoopGroup-9-94 | ConnectionAdapterImpl | 306 - org.opendaylight.openflowjava.openflow-protocol-impl - 0.8.2.SNAPSHOT | Hello received / branch
2017-02-23 18:09:12,522 | INFO | ntLoopGroup-9-94 | DeviceManagerImpl | 311 - org.opendaylight.openflowplugin.impl - 0.3.2.SNAPSHOT | ConnectionEvent: Device connected to controller, Device:/10.0.118.2:57398, NodeId:Uri [_va
lue=openflow:198223095212864]
2017-02-23 18:09:12,523 | INFO | ntLoopGroup-9-94 | SalRoleServiceImpl | 311 - org.opendaylight.openflowplugin.impl - 0.3.2.SNAPSHOT | SetRole called with input:SetRoleInput [_controllerRole=BECOMESLAVE, _node=NodeRef [_value
=KeyedInstanceIdentifier

{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.open daylight.inventory.rev130819.nodes.Node[key=NodeKey [_id=Uri [_value=openflow:198223095212864]]]]}

], augmentation=[]]
2017-02-23 18:09:12,523 | INFO | ntLoopGroup-9-94 | SalRoleServiceImpl | 311 - org.opendaylight.openflowplugin.impl - 0.3.2.SNAPSHOT | Requesting state change to BECOMESLAVE
2017-02-23 18:09:12,523 | INFO | ntLoopGroup-9-94 | SalRoleServiceImpl | 311 - org.opendaylight.openflowplugin.impl - 0.3.2.SNAPSHOT | RoleChangeTask called on device:openflow:198223095212864 OFPRole:BECOMESLAVE
2017-02-23 18:09:12,523 | INFO | ntLoopGroup-9-94 | RoleService | 311 - org.opendaylight.openflowplugin.impl - 0.3.2.SNAPSHOT | getGenerationIdFromDevice called for device: openflow:198223095212864
2017-02-23 18:09:12,523 | WARN | ntLoopGroup-9-94 | LifecycleServiceImpl | 311 - org.opendaylight.openflowplugin.impl - 0.3.2.SNAPSHOT | Failed to register cluster singleton service for node org.opendaylight.openflowplugin.impl
.connection.ConnectionContextImpl$DeviceInfoImpl@4f7874de, with exception: java.lang.RuntimeException: org.opendaylight.mdsal.eos.common.api.CandidateAlreadyRegisteredException: Candidate has already been registered for DOMEntity [type=org.
opendaylight.mdsal.ServiceEntityType, id=/(urn:opendaylight:params:xml:ns:yang:mdsal:core:general-entity?revision=2015-09-30)entity/entity[

{(urn:opendaylight:params:xml:ns:yang:mdsal:core:general-entity?revision=2015-09-30)name=openflow:198 223095212864}

]]

Comment by Koby Aizer [ 23/Feb/17 ]

Errors starting in:
2017-02-23 15:01:22,522 | INFO | entLoopGroup-9-2 | ConnectionAdapterImpl | 306 - org.opendaylight.openflowjava.openflow-protocol-impl - 0.8.2.SNAPSHOT | Hello received / branch

Comment by Koby Aizer [ 23/Feb/17 ]

Attachment karaf.log.gz has been added with description: karaf.log

Comment by dotans [ 28/Feb/17 ]

Attachment file.pcap has been added with description: pcap file -OF port

Comment by dotans [ 28/Feb/17 ]

Attachment karaf.zip has been added with description: karaf.log -TRACE

Comment by dotans [ 28/Feb/17 ]

I did the same scenario as Kobi with debug level of karaf.log set to TRACE.
I also use tcpdump on port 6653.
both attached

Comment by dotans [ 02/Mar/17 ]

fix in stable/boron SR3:
https://git.opendaylight.org/gerrit/#/c/51489/7

Generated at Wed Feb 07 20:33:29 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.