[OPNFLWPLUG-668] [Clustering] Switch state resync after cluster restart. Created: 05/Apr/16 Updated: 27/Sep/21 Resolved: 29/Jul/16 |
|
| Status: | Resolved |
| Project: | OpenFlowPlugin |
| Component/s: | General |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Saibal Roy | Assignee: | Unassigned |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| Attachments: |
|
| External issue ID: | 5659 |
| Priority: | High |
| Description |
|
Hi, I was testing the cluster restart scenario with latest Be Code with He Plugin. Build used : Objective of test : Configuration and topology: i.Controllers (c1, c2 and c3) VMs are running in Dell machine say h1, each VM has 8 vcpu and 16g RAM configuration ii. Mininet (m1, m2 and m3) VMs with ovs version 2.3.2 are running in different Dell machine say h2, each VM has 8 vcpu and 16g RAM configuration. m1 with 5 switches (1 to 6) connected to c1 Test Steps : Steps: Observations Attaching the logs for more clarity. Thanks & Regards, |
| Comments |
| Comment by Saibal Roy [ 05/Apr/16 ] |
|
Attachment 15switches.zip has been added with description: logs for Switch state resync after cluster restart |
| Comment by Muthukumaran Kothandaraman [ 05/Apr/16 ] |
|
Hi Saibal, Looking at logs and the symptoms you have observed, this could be a case where datastore may not be fully available (all persisted data restored consistently across the cluster) when switches reconnect. Specifically in this case, switches are constantly hunting for port 6633 to be opened on all cluster nodes. So, soon after the ports get opened (rather prematurely) switches pounce upon the controller nodes like hungry tigers. But at that juncture, perhaps datastore is still in mode of coming up (doing restoration etc.). One quick way to verify is to add a linux firewall rule which blocks port 6633 for 3-5 minutes as part of karaf.sh. This can prevent switches from connecting prematurely before datastore becomes fully "ready". If we can confirm the clean behavior with this hack, we can discuss on a more cleaner solution for how to prevent 6633 from opening up when all backends are in clean "ready" state Regards |
| Comment by Muthukumaran Kothandaraman [ 29/Jul/16 ] |
|
To be retested on latest boron master with lithium plugin combination to re-establish this. Mainly because Boron release is moving with Lithium |