[OPNFLWPLUG-600] He: Unable to control a OF 1.3 switch after a Role-Request message for the switch fails Created: 21/Jan/16  Updated: 27/Sep/21  Resolved: 11/Feb/16

Status: Resolved
Project: OpenFlowPlugin
Component/s: General
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Hideyuki Tai Assignee: Anil Vishnoi
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 5050

 Description   

When the Helium version of OpenFlow plugin connected to an OpenFlow 1.3 switch, it failed to execute a role push request.
As the result, the plugin could not do anything to the switch, including installing flow entries.

The issue occasionally happens when the Helium version connects to OpenFlow 1.3 switches.

For example, it occurred in the VTN CSIT, too.
https://jenkins.opendaylight.org/releng/view/vtn/job/vtn-csit-1node-manager-only-beryllium/67/artifact/karaf.log

I had not observed this issue before yesterday, so I guess the bug is created recently, likely yesterday.

I used the ODL controller which was built today from the latest source code of the stable/beryllium branch of the VTN project to detect the issue.
When I observed the issue, I ran just one single controller.

The error messages I saw in the karaf.log are as follows:

2016-01-21 06:45:08,556 | INFO | entLoopGroup-9-7 | ConnectionAdapterImpl | 159 - org.opendaylight.openflowjava.openflow-protocol-impl - 0.7.0.SNAPSHOT | Hello received / branch
2016-01-21 06:45:08,558 | INFO | entLoopGroup-9-7 | OfEntityManager | 162 - org.opendaylight.openflowplugin - 0.2.0.SNAPSHOT | requestOpenflowEntityOwnership: Registered controller for the
ownership of Uri [_value=openflow:6]
2016-01-21 06:45:08,559 | INFO | entLoopGroup-9-7 | OfEntityManager | 162 - org.opendaylight.openflowplugin - 0.2.0.SNAPSHOT | setSlaveRole: Set controller as a SLAVE controller for 6
2016-01-21 06:45:08,559 | INFO | ofEntity-0 | RolePushTask | 162 - org.opendaylight.openflowplugin - 0.2.0.SNAPSHOT | Pushing SLAVE role configuration to device openflow:6
2016-01-21 06:45:08,559 | INFO | ofEntity-0 | RolePushTask | 162 - org.opendaylight.openflowplugin - 0.2.0.SNAPSHOT | Received generation-id 18446744073709551615 for role change request from device 6
2016-01-21 06:45:08,560 | INFO | ofEntity-0 | RolePushTask | 162 - org.opendaylight.openflowplugin - 0.2.0.SNAPSHOT | Pushing role change SLAVE config request with generation-id 0 to device 6
2016-01-21 06:45:08,560 | INFO | pool-28-thread-1 | StatNodeRegistrationImpl | 167 - org.opendaylight.openflowplugin.applications.statistics-manager - 0.2.0.SNAPSHOT | Received onNodeUpdated for node FlowCapableNodeUpdated [_description=None, _hardware=Open vSwitch, _manufacturer=Nicira, Inc., _serialNumber=None, _software=2.1.0]
2016-01-21 06:45:08,560 | INFO | lt-dispatcher-22 | OfEntityManager | 162 - org.opendaylight.openflowplugin - 0.2.0.SNAPSHOT | onDeviceOwnershipChanged: Set controller as a MASTER controller because it's the OWNER of the Entity{type='openflow', id=/(urn:opendaylight:params:xml:ns:yang:controller:md:sal:core:general-entity?revision=2015-08-20)entity/entity[

{(urn:opendaylight:params:xml:ns:yang:controller:md:sal:core:general-entity?revision=2015-08-20)name=openflow:6}

]}
2016-01-21 06:45:08,564 | INFO | ofEntity-0 | OfEntityManager | 162 - org.opendaylight.openflowplugin - 0.2.0.SNAPSHOT | setSlaveRole: Controller is set as a SLAVE for 6

2016-01-21 06:45:08,567 | INFO | lt-dispatcher-22 | OfEntityManager | 162 - org.opendaylight.openflowplugin - 0.2.0.SNAPSHOT | registerRoutedRPCForSwitch: Registered routed rpc for ModelDrivenSwitch openflow:6
2016-01-21 06:45:08,567 | INFO | ofEntity-0 | RolePushTask | 162 - org.opendaylight.openflowplugin - 0.2.0.SNAPSHOT | Pushing MASTER role configuration to device openflow:6
2016-01-21 06:45:08,568 | WARN | entLoopGroup-9-7 | ResponseExpectedRpcListener | 159 - org.opendaylight.openflowjava.openflow-protocol-impl - 0.7.0.SNAPSHOT | Request for RpcResultKey [xid=12, outputClazz=org.opendaylight.yang.gen.v1.urn.opendaylight.openflow.protocol.rev130731.BarrierOutput] did not receive a response
2016-01-21 06:45:08,568 | WARN | entLoopGroup-9-7 | ConnectionAdapterImpl | 159 - org.opendaylight.openflowjava.openflow-protocol-impl - 0.7.0.SNAPSHOT | received unexpected rpc response: RpcResultKey [xid=13, outputClazz=org.opendaylight.yang.gen.v1.urn.opendaylight.openflow.protocol.rev130731.BarrierOutput]
016-01-21 06:45:08,568 | ERROR | ofEntity-0 | RolePushTask | 162 - org.opendaylight.openflowplugin - 0.2.0.SNAPSHOT | Role push request failed for device [6]
org.opendaylight.openflowplugin.openflow.md.core.session.RolePushException: Giving up role change: read generation-id failed Timeout waiting for task.
at org.opendaylight.openflowplugin.openflow.md.core.session.RolePushTask.call(RolePushTask.java:117)[162:org.opendaylight.openflowplugin:0.2.0.SNAPSHOT]
at org.opendaylight.openflowplugin.openflow.md.core.session.RolePushTask.call(RolePushTask.java:38)[162:org.opendaylight.openflowplugin:0.2.0.SNAPSHOT]
at java.util.concurrent.FutureTask.run(FutureTask.java:262)[:1.7.0_75]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)[:1.7.0_75]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)[:1.7.0_75]
at java.lang.Thread.run(Thread.java:745)[:1.7.0_75]
2016-01-21 06:45:08,569 | ERROR | ofEntity-0 | RolePushTask | 162 - org.opendaylight.openflowplugin - 0.2.0.SNAPSHOT | Generation ID is NULL for device [6]
2016-01-21 06:45:08,569 | WARN | ofEntity-0 | OfEntityManager | 162 - org.opendaylight.openflowplugin - 0.2.0.SNAPSHOT | onDeviceOwnershipChanged: Controller is not able to set the MASTER role for 6.
org.opendaylight.openflowplugin.openflow.md.core.session.RolePushException: Giving up role change: current generation-id can not be read
at org.opendaylight.openflowplugin.openflow.md.core.session.RolePushTask.call(RolePushTask.java:129)[162:org.opendaylight.openflowplugin:0.2.0.SNAPSHOT]
at org.opendaylight.openflowplugin.openflow.md.core.session.RolePushTask.call(RolePushTask.java:38)[162:org.opendaylight.openflowplugin:0.2.0.SNAPSHOT]
at java.util.concurrent.FutureTask.run(FutureTask.java:262)[:1.7.0_75]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)[:1.7.0_75]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)[:1.7.0_75]
at java.lang.Thread.run(Thread.java:745)[:1.7.0_75]
2016-01-21 06:45:08,569 | INFO | ofEntity-0 | OfEntityManager | 162 - org.opendaylight.openflowplugin - 0.2.0.SNAPSHOT | onDeviceOwnershipChanged: ..and this instance is owner of the device 6. Closing the registration, so other entity can become owner and attempt to be master controller.
2016-01-21 06:45:08,571 | INFO | lt-dispatcher-22 | OfEntityManager | 162 - org.opendaylight.openflowplugin - 0.2.0.SNAPSHOT | registerRoutedRPCForSwitch: Skipping routed rpc registration for ModelDrivenSwitch openflow:6
2016-01-21 06:45:08,575 | INFO | ofEntity-0 | OfEntityManager | 162 - org.opendaylight.openflowplugin - 0.2.0.SNAPSHOT | deregisterRoutedRPCForSwitch: De-registered routed rpc for ModelDrivenSwitch openflow:6
2016-01-21 06:45:08,575 | INFO | ofEntity-0 | OfEntityManager | 162 - org.opendaylight.openflowplugin - 0.2.0.SNAPSHOT | onDeviceOwnershipChanged: ..and registering it back to participate in ownership and re-try
2016-01-21 06:45:08,575 | INFO | ofEntity-0 | OfEntityManager | 162 - org.opendaylight.openflowplugin - 0.2.0.SNAPSHOT | onDeviceOwnershipChanged: re-registered candidate for ownership of the 6
2016-01-21 06:45:08,575 | INFO | ofEntity-0 | RolePushTask | 162 - org.opendaylight.openflowplugin - 0.2.0.SNAPSHOT | Pushing SLAVE role configuration to device openflow:6
2016-01-21 06:45:08,576 | INFO | ult-dispatcher-3 | OfEntityManager | 162 - org.opendaylight.openflowplugin - 0.2.0.SNAPSHOT | onDeviceOwnershipChanged: Set controller as a MASTER controller because it's the OWNER of the Entity{type='openflow', id=/(urn:opendaylight:params:xml:ns:yang:controller:md:sal:core:general-entity?revision=2015-08-20)entity/entity[

{(urn:opendaylight:params:xml:ns:yang: controller:md:sal:core:general-entity?revision=2015-08-20)name=openflow:6}

]}



 Comments   
Comment by Hideyuki Tai [ 21/Jan/16 ]

I'm thinking the following patch (Gerrit 32748) is related to the bug.

https://git.opendaylight.org/gerrit/#/c/32748
https://git.opendaylight.org/gerrit/#/c/32748/4/openflowplugin/src/main/java/org/opendaylight/openflowplugin/openflow/md/core/session/RolePushTask.java

Comment by Hideyuki Tai [ 22/Jan/16 ]

Created a patch for the bug.
https://git.opendaylight.org/gerrit/#/c/33323/

I need to do more tests, and need comments from the OF plugin project.

Comment by Anil Vishnoi [ 22/Jan/16 ]

So this is what is happening. When role push fails, controller try to deregister the instance and want any other running instance of openflowplugin to take the ownership of the device. But in case of single node controller, ownership again granted to the same controller and it sees that rpc is already registered and it skips the registration. You patch should fix this.

Comment by Hideyuki Tai [ 22/Jan/16 ]

Here is my investigation note for the bug.

First of all, I observed that the OF plugin (Helium version) occasionally failed to get the response to a Role-Request messages.
Here is the ERROR message I observed in the karaf.log.

2016-01-21 06:45:08,568 | ERROR | ofEntity-0 | RolePushTask | 162 - org.opendaylight.openflowplugin - 0.2.0.SNAPSHOT | Role push request failed for device [6]

I'm not sure why the plugin failed to get the response.
Anyway, when the failure happened, applications of the OF plugin was unable to control the switch.
Investigating the message in the karaf.log, I think this was because the routed RPC of the OpenFlow plugin was not correctly registered.

Here are log messages I observed in the Karaf.log:

2016-01-21 06:45:08,583 | WARN | nPool-5-worker-7 | StatRpcMsgManagerImpl | 167 - org.opendaylight.openflowplugin.applications.statistics-manager - 0.2.0.SNAPSHOT | Response Registration for Statistics RPC call fail!
org.opendaylight.controller.md.sal.dom.api.DOMRpcImplementationNotAvailableException: No local or remote implementation available for rpc AbsoluteSchemaPath

{path=[(urn:opendaylight:group:statistics?revision=2013-11-11)get-group-features]}

at org.opendaylight.controller.remote.rpc.RemoteRpcImplementation$1.onComplete(RemoteRpcImplementation.java:65)[150:org.opendaylight.controller.sal-remoterpc-connector:1.3.0.SNAPSHOT]

2016-01-21 06:47:37,493 | ERROR | nPool-5-worker-5 | VTNPacketService | 175 - org.opendaylight.vtn.manager.implementation - 0.4.0.SNAPSHOT | transmit-packet: Failed to transmit packet.
org.opendaylight.controller.md.sal.dom.api.DOMRpcImplementationNotAvailableException: No local or remote implementation available for rpc AbsoluteSchemaPath

{path=[(urn:opendaylight:packet:service?revision=2013-07-09)transmit-packet]}

Investigating the source code of the OF plugin, I understand that when the failure on a Role-Request message happens, the OF plugin retries to send Role-Request messages again and again until the plugin successfully gets the response to a Role-Request to ensure the change the role of the controller to MASTER.
During these tries, the OF plugin registers and deregisters the routed RPC repeatedly. However, OfEntityManager of the OF plugin unregistered the routed RPC in the incorrect way. As the result, the OF plugin wrongly skipped to re-register the routed RPC. I think that's why it was unable to control a switch, once the Role-Request failed on the switch.

The following patch fixes the incorrect way of the deregistration.
https://git.opendaylight.org/gerrit/#/c/33323/ (stable/beryllium)

Comment by Hideyuki Tai [ 22/Jan/16 ]

(In reply to Anil Vishnoi from comment #3)
> So this is what is happening. When role push fails, controller try to
> deregister the instance and want any other running instance of
> openflowplugin to take the ownership of the device. But in case of single
> node controller, ownership again granted to the same controller and it sees
> that rpc is already registered and it skips the registration. You patch
> should fix this.

Hi Anil,
Thank you for reviewing my patch, and merge it.

Before I pushed a patch, I tested the patch in my local environment, and confirmed that the patch fixed the issue.

Comment by Hideyuki Tai [ 29/Jan/16 ]

My colleague has observed that the same issue still happened again and again on the latest source code of the stable/beryllium.

Comment by Anil Vishnoi [ 29/Jan/16 ]

Hi Hideyuki,

Can you please provide the role, i need to see why role change is failing in your environment.

Comment by Hideyuki Tai [ 29/Jan/16 ]

(In reply to Anil Vishnoi from comment #7)
> Hi Hideyuki,
>
> Can you please provide the role, i need to see why role change is failing in
> your environment.

He ran ODL controller in single not.

I try to get the information from my colleague.

To be honest, I tried to reproduce the issue with the latest artifacts on the stable/lithium in this week. But, I couldn't reproduce the issue. However, in my colleague machine environment, he can easily reproduce the issue many times. That's wired.

Comment by Hideyuki Tai [ 29/Jan/16 ]

(In reply to Hideyuki Tai from comment #8)
> (In reply to Anil Vishnoi from comment #7)
> > Hi Hideyuki,
> >
> > Can you please provide the role, i need to see why role change is failing in
> > your environment.
>
> He ran ODL controller in single not.
>

I mean in single node.

Comment by Hideyuki Tai [ 04/Feb/16 ]

(In reply to Anil Vishnoi from comment #7)
> Hi Hideyuki,
>
> Can you please provide the role, i need to see why role change is failing in
> your environment.

Hi Anil,

I and my colleagues tried to reproduce the issue with the latest artifacts for Beryllium.
And, we've not observed the issue on the latest artifacts.
We've concluded that some patches fix the issue completely.

So I'm changing the status of the bug report to "RESOLVED".

Thank you for your great support!

Comment by Hideyuki Tai [ 11/Feb/16 ]

The patch was merged in the master branch of the openflowplugin.git.
https://git.opendaylight.org/gerrit/#/c/34272/ (master)

Generated at Wed Feb 07 20:32:54 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.