[LACP-10] RPC Exception messages are thrown while trying to scale the number of LAG's to 32 entries. Created: 02/Jun/15  Updated: 09/Jun/15  Resolved: 09/Jun/15

Status: Resolved
Project: lacp
Component/s: General
Affects Version/s: unspecified
Fix Version/s: None

Type: Bug
Reporter: Mahesh Manivasagam Assignee: Kalaiselvi
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Zip Archive karaf_logs.zip    
External issue ID: 3539

 Description   

Mininet Topology: 32 Host & Switches and 2 links connecting each host and switch. Configure the Hosts on LACP mode and check if the LAG groups are formed on the switch side.
Openvswitch version: 2.3.1
Build no: Integration build 2133

Steps to recreate:
1. Bring up the controller and feature install the LACP module.
2. Once the controller is setup, create a mininet topology script as specified above and check if the group entries are formed on all the switches. <<<Entries are formed as expected
3. Also, check the Karaf logs to see if there are any LACP specific errors being generated while scaling up to this number. <<<<RPC Exception issue seen now.

Error message generated on Karaf log:

2015-06-02 11:24:24,838 | ERROR | Thread-80 | LacpGroupTbl | 203 - org.opendaylight.lacp.main.lacp.main.impl - 1.0.0.SNAPSHOT | received interrupt {}java.util.concurrent.ExecutionException: org.opendaylight.controller.md.sal.dom.api.DOMRpcImplementationNotAvailableException: No implementation of RPC AbsoluteSchemaPath

{path=[(urn:opendaylight:group:service?revision=2013-09-18)update-group]}

available <<<<<<On looking at the message, the exception problem looks to be because of some issue with the LcpGroupTbl impl.

Sometimes, the same message is slightly modified and displayed as below when you scale up the LAG numbers first and create the topology, then follow it up by starting the controller.
64 LAG's in the below case
2015-06-01 14:44:15,591 | ERROR | Thread-1073 | LacpFlow | 203 - org.opendaylight.lacp.main.lacp.main.impl - 1.0.0.SNAPSHOT | received

interrupt in lacp flow removal java.util.concurrent.ExecutionException: org.opendaylight.controller.md.sal.dom.api.DOMRpcImplementationNotAvailableException:

SchemaPath AbsoluteSchemaPath

{path=[(urn:opendaylight:flow:service?revision=2013-08-19)remove-flow]}

is not resolved to an RPC <<<<On looking at the message, the exception problem looks to be because of some issue with the LacpFlow impl.

Please find the Karaf.log files attached along with this bug.



 Comments   
Comment by Mahesh Manivasagam [ 02/Jun/15 ]

Attachment karaf_logs.zip has been added with description: Karaf_logs

Comment by Kalaiselvi [ 05/Jun/15 ]

The reported RPC exceptions are thrown because in the script, the set bridge command is used to specify the openflow version. This command causes the openflow connection to be reset and open a new connection.
The flow programming or the group programming done at the moment the switch closes its connections to the controller the concurrent ExecutionException is thrown.
As the switches are already started with the version openflow13, the set bridge command is not required to be configured.

When the switches are teared down and then restarted, some additional node update notifications are received because LACP aggregator information is re-written for removed ports. This causes the switch to be available in memory and when the switches re-brought up again, the flow programming is skipped as the node is already available in the in-memory db.

Comment by Kalaiselvi [ 09/Jun/15 ]

https://git.opendaylight.org/gerrit/#/c/22054/
On removal of ports from lag, md-sal data store write is avoided.
Instead on the information deletion is done in the data-store.
So this does not trigger node-update messages.

Generated at Wed Feb 07 20:05:58 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.