[OPNFLWPLUG-600] He: Unable to control a OF 1.3 switch after a Role-Request message for the switch fails Created: 21/Jan/16 Updated: 27/Sep/21 Resolved: 11/Feb/16 |
|
| Status: | Resolved |
| Project: | OpenFlowPlugin |
| Component/s: | General |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Hideyuki Tai | Assignee: | Anil Vishnoi |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| External issue ID: | 5050 |
| Description |
|
When the Helium version of OpenFlow plugin connected to an OpenFlow 1.3 switch, it failed to execute a role push request. The issue occasionally happens when the Helium version connects to OpenFlow 1.3 switches. For example, it occurred in the VTN CSIT, too. I had not observed this issue before yesterday, so I guess the bug is created recently, likely yesterday. I used the ODL controller which was built today from the latest source code of the stable/beryllium branch of the VTN project to detect the issue. The error messages I saw in the karaf.log are as follows: 2016-01-21 06:45:08,556 | INFO | entLoopGroup-9-7 | ConnectionAdapterImpl | 159 - org.opendaylight.openflowjava.openflow-protocol-impl - 0.7.0.SNAPSHOT | Hello received / branch ]} 2016-01-21 06:45:08,567 | INFO | lt-dispatcher-22 | OfEntityManager | 162 - org.opendaylight.openflowplugin - 0.2.0.SNAPSHOT | registerRoutedRPCForSwitch: Registered routed rpc for ModelDrivenSwitch openflow:6 ]} |
| Comments |
| Comment by Hideyuki Tai [ 21/Jan/16 ] |
|
I'm thinking the following patch (Gerrit 32748) is related to the bug. https://git.opendaylight.org/gerrit/#/c/32748 |
| Comment by Hideyuki Tai [ 22/Jan/16 ] |
|
Created a patch for the bug. I need to do more tests, and need comments from the OF plugin project. |
| Comment by Anil Vishnoi [ 22/Jan/16 ] |
|
So this is what is happening. When role push fails, controller try to deregister the instance and want any other running instance of openflowplugin to take the ownership of the device. But in case of single node controller, ownership again granted to the same controller and it sees that rpc is already registered and it skips the registration. You patch should fix this. |
| Comment by Hideyuki Tai [ 22/Jan/16 ] |
|
Here is my investigation note for the bug. First of all, I observed that the OF plugin (Helium version) occasionally failed to get the response to a Role-Request messages. 2016-01-21 06:45:08,568 | ERROR | ofEntity-0 | RolePushTask | 162 - org.opendaylight.openflowplugin - 0.2.0.SNAPSHOT | Role push request failed for device [6] I'm not sure why the plugin failed to get the response. Here are log messages I observed in the Karaf.log: 2016-01-21 06:45:08,583 | WARN | nPool-5-worker-7 | StatRpcMsgManagerImpl | 167 - org.opendaylight.openflowplugin.applications.statistics-manager - 0.2.0.SNAPSHOT | Response Registration for Statistics RPC call fail! at org.opendaylight.controller.remote.rpc.RemoteRpcImplementation$1.onComplete(RemoteRpcImplementation.java:65)[150:org.opendaylight.controller.sal-remoterpc-connector:1.3.0.SNAPSHOT] 2016-01-21 06:47:37,493 | ERROR | nPool-5-worker-5 | VTNPacketService | 175 - org.opendaylight.vtn.manager.implementation - 0.4.0.SNAPSHOT | transmit-packet: Failed to transmit packet. Investigating the source code of the OF plugin, I understand that when the failure on a Role-Request message happens, the OF plugin retries to send Role-Request messages again and again until the plugin successfully gets the response to a Role-Request to ensure the change the role of the controller to MASTER. The following patch fixes the incorrect way of the deregistration. |
| Comment by Hideyuki Tai [ 22/Jan/16 ] |
|
(In reply to Anil Vishnoi from comment #3) Hi Anil, Before I pushed a patch, I tested the patch in my local environment, and confirmed that the patch fixed the issue. |
| Comment by Hideyuki Tai [ 29/Jan/16 ] |
|
My colleague has observed that the same issue still happened again and again on the latest source code of the stable/beryllium. |
| Comment by Anil Vishnoi [ 29/Jan/16 ] |
|
Hi Hideyuki, Can you please provide the role, i need to see why role change is failing in your environment. |
| Comment by Hideyuki Tai [ 29/Jan/16 ] |
|
(In reply to Anil Vishnoi from comment #7) He ran ODL controller in single not. I try to get the information from my colleague. To be honest, I tried to reproduce the issue with the latest artifacts on the stable/lithium in this week. But, I couldn't reproduce the issue. However, in my colleague machine environment, he can easily reproduce the issue many times. That's wired. |
| Comment by Hideyuki Tai [ 29/Jan/16 ] |
|
(In reply to Hideyuki Tai from comment #8) I mean in single node. |
| Comment by Hideyuki Tai [ 04/Feb/16 ] |
|
(In reply to Anil Vishnoi from comment #7) Hi Anil, I and my colleagues tried to reproduce the issue with the latest artifacts for Beryllium. So I'm changing the status of the bug report to "RESOLVED". Thank you for your great support! |
| Comment by Hideyuki Tai [ 11/Feb/16 ] |
|
The patch was merged in the master branch of the openflowplugin.git. |