[OPNFLWPLUG-889] Switch disconnect while mastership is being negotiated produces stale switch entry Created: 10/May/17 Updated: 27/Sep/21 Resolved: 17/May/17 |
|
| Status: | Resolved |
| Project: | OpenFlowPlugin |
| Component/s: | General |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Luis Gomez | Assignee: | Miroslav Macko |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| External issue ID: | 8411 |
| Description |
|
This only happens in Carbon and it is tracked in the reconciliation suite: To reproduce: 1) Start 3 nodes cluster 2) Start 1 switch (s1) and connect it to 2 nodes (.101 and .102): sudo ovs-vsctl set-controller s1 "tcp:192.168.0.101:6633" "tcp:192.168.0.102:6633" 3) Block communication owner to switch (in this example .101): sudo iptables -A INPUT --source 192.168.0.101 -j DROP 4) Just after block communication remaining node to switch: sudo iptables -A INPUT --source 192.168.0.102 -j DROP 5) Now observe the following after few seconds:
The result of the above is an stale switch in operational, only workaround is reboot entire cluster. BR/Luis |
| Comments |
| Comment by Luis Gomez [ 16/May/17 ] |
|
Also rising this to blocker, this issue is not seen in Boron so we should fix if we can. |
| Comment by Miroslav Macko [ 16/May/17 ] |
|
Hello Luis, I am not able to reproduce it locally on stable/carbon followed your instructions. I have checked /restconf/operational/network-topology:network-topology after blocking both controllers, switch closes connections, and operational is clean. I have tried to block SLAVE right after MASTER without waiting for new MASTER elected. And also with waiting for new MASTER elected. Topology was cleaned up both times. I have checked also Jenkins logs. Thanks, |
| Comment by Luis Gomez [ 16/May/17 ] |
|
Hi Miroslav, the issue is still there, see this recent build: I also realized that sometimes the switch is cleared from operational but it is not from entity owner so when switch reconnects it does not work. Please repeat the steps above and when you disconnected slave member just after master member, check entity owner API (I always see slave as master in entity owner when it is not connected). Also make sure you block controller to switch communication, not the other way around. If you have problem reproducing I can setup some call and share my desktop. BR/Luis |
| Comment by Miroslav Macko [ 17/May/17 ] |
|
Hello Luis, Yes. I am exactly following your instruction. I am blocking controller to switch communication. Operational is still clean. But you are right, that entity owner is not. http://10.0.42.201:8181/restconf/operational/entity-owners:entity-owners* Is this what you meant? Or do you check it other way? Jozef will try to prepare patch for it. Thank you, |
| Comment by A H [ 17/May/17 ] |
|
We are looking to build Carbon RC2 tomorrow 5/18 at 23:59 UTC time assuming there are no blocker bugs. Is there an ETA for when a fix can be merged and this bug resolved for stable/carbon branch? |
| Comment by Jozef Bacigal [ 17/May/17 ] |
|
https://git.opendaylight.org/gerrit/#/c/57232/9 Here is the patch, Luis can you test it plz ? Jozef |
| Comment by A H [ 17/May/17 ] |
|
(In reply to Jozef Bacigal from comment #6) This patch is failing to build in jenkins and is still missing +2 from committers. |
| Comment by Luis Gomez [ 17/May/17 ] |
|
Let me test this although patch test does not look very good. |
| Comment by Luis Gomez [ 17/May/17 ] |
|
OK guys, bad and good news. BAD: GOOD: So I am closing this bug |