[OPNFLWPLUG-635] Lifecycle cleaning Created: 15/Mar/16  Updated: 27/Sep/21  Resolved: 16/May/16

Status: Resolved
Project: OpenFlowPlugin
Component/s: General
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Jozef Slezák Assignee: Jozef Bacigal
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Zip Archive log20160316.zip    
Issue Links:
Blocks
blocks OPNFLWPLUG-634 Li: MDSalRegistrationUtil change Resolved
blocks OPNFLWPLUG-670 [Clustering] On connection flapping w... Resolved
blocks OPNFLWPLUG-671 Entity-owners not correct after recon... Resolved
is blocked by L2SWITCH-64 Beryllium Test Failures Resolved
is blocked by OPNFLWPLUG-638 Milestone: Add "conductor" for change... Resolved
is blocked by OPNFLWPLUG-639 Milestone: Replace DeviceState with s... Resolved
External issue ID: 5523

 Description   

When restarting switches on running controller controller has issues in clening up its state.

We were testing that on stable/beryllim SR1 using loop: start_network.sh, sleep 0.5sec stop_network.sh.

Please reproduce it also on 3node cluster.



 Comments   
Comment by Jozef Slezák [ 16/Mar/16 ]

Please take a look at this log messages in attached log20160316.zip. During last testing on stable/beryllium I saw node in operational inventory on disconnected network after that not all flows were reconciled.

2016-03-16 18:42:04,174 | WARN | pool-33-thread-1 | RpcContextImpl | 157 - org.opendaylight.openflowplugin.impl - 0.2.1.SNAPSHOT | Xid cannot be reserved for new RequestContext, node:Uri [_value=openflow:18087942711149584731]
2

org.opendaylight.yangtools.yang.data.api.schema.tree.ModifiedNodeDoesNotExistException: Node /(urn:opendaylight:inventory?revision=2013-08-19)nodes/node/node[

{(urn:opendaylight:inventory?revision=2013-08-19)id=openflow:18088224186126241907}

]/AugmentationIdentifier

{childNames=[(urn:opendaylight:flow:inventory?revision=2013-08-19)supported-match-types, (urn:opendaylight:flow:inventory?revision=2013-08-19)port-number, (urn:opendaylight:flow:inventory?revision=2013-08-19)serial-number, (urn:opendaylight:flow:inventory?revision=2013-08-19)group, (urn:opendaylight:flow:inventory?revision=2013-08-19)ip-address, (urn:opendaylight:flow:inventory?revision=2013-08-19)manufacturer, (urn:opendaylight:flow:inventory?revision=2013-08-19)stale-group, (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-instructions, (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-actions, (urn:opendaylight:flow:inventory?revision=2013-08-19)table, (urn:opendaylight:flow:inventory?revision=2013-08-19)stale-meter, (urn:opendaylight:flow:inventory?revision=2013-08-19)hardware, (urn:opendaylight:flow:inventory?revision=2013-08-19)switch-features, (urn:opendaylight:flow:inventory?revision=2013-08-19)description, (urn:opendaylight:flow:inventory?revision=2013-08-19)software, (urn:opendaylight:flow:inventory?revision=2013-08-19)meter]}

/(urn:opendaylight:flow:inventory?revision=2013-08-19)group/group[

{(urn:opendaylight:flow:inventory?revision=2013-08-19)group-id=3}

] does not exist. Cannot apply modification to its children.

Comment by Jozef Slezák [ 16/Mar/16 ]

Attachment log20160316.zip has been added with description: Log files after restarting OVS2.4 on running controller

Comment by Jozef Slezák [ 17/Mar/16 ]

Vaso, we have robot test to reproduce this problem.

Comment by A H [ 06/May/16 ]

A patch for this bug was submitted:

https://git.opendaylight.org/gerrit/#/c/38369/
https://git.opendaylight.org/gerrit/#/c/38111/
https://git.opendaylight.org/gerrit/#/c/37661/

To better assess the impact of this bug and fix, could someone from your team please help us identify the following:

Severity: Could you elaborate on the severity of this bug? Is this a BLOCKER such that we cannot release Beryllium without it? Is there a workaround such that we can write a release note and fix in future Beryllium SR3?
Testing: Could you also elaborate on the testing or this patch? How extensively has this patch been tested? Is it covered by any unit tests or system tests?
Impact: Does this fix impact any dependent projects?

Comment by Jozef Bacigal [ 09/May/16 ]

This patch was made behalf on clustering changes needed to be implemented.On the changes were made and merged before the release SR2, we tested it in cluster in our place. Changes priorities were to be stable on connection flapping, and we rewrote whole connection and disconnection logic, made double candidate approach in plugin. Three patches mentioned, took to long to test, we manage better testing on cluster as we were able before, so we made a step forward. This patches meant to be ready before SR2 and merged before the lock. We all, most of us, agreed it is a regression for the SR2 in “cluster environment”. I don’t think we are able to make any workaround, its too complex to simplify this patch.

We were testing in on our cluster every day for last two weeks. But we don’t have any robot tests, integration tests, but some unit test.

This fix doesn’t impact any depending project, it is changing only plugin in cluster behavior like change mastership. It isn’t change statistics gathering nor rpc services nor yang models.

Patches were already merged in beryllium, I will close this bug, if some errors or bug comes out from testing we should raise another bug.

Comment by A H [ 09/May/16 ]

Have we verified if the bug has been fixed in the latest build for Beryllium SR2?

https://wiki.opendaylight.org/view/Simultaneous_Release:Beryllium_Release_Plan#Beryllium_SR2_Download

Generated at Wed Feb 07 20:32:59 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.