[OPNFLWPLUG-591] [Clustering]: Openflow connections unstable with Lithium plugin Created: 11/Jan/16 Updated: 27/Sep/21 Resolved: 01/Apr/16 |
|
| Status: | Resolved |
| Project: | OpenFlowPlugin |
| Component/s: | General |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Saibal Roy | Assignee: | Saibal Roy |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| External issue ID: | 4925 | ||||||||
| Priority: | High | ||||||||
| Description |
|
Build used : Test Type : Objective of test : Test Steps : sudo mn --custom /home/mininet/mininet/custom/mytopo.py --topo mytopo --controller remote,ip=10.183.181.41,port=6633 --switch ovsk,protocols=Openflow13 Note:mytopo.py is attached for quick reference. 3.Check if the configured count of switches have establised connection with the controller instances using netstat netstat -pan | grep 6633 | grep ESTABLISHED | grep ovs-vswitchd | wc -l Test Results: 1. Without any flow traffic or LLDP traffic itself, switch connection keeps flapping with respect to the controller. 2. Operational Datastore reflects totally different count of switches in inventory shards(perhaps this is a side effect of BZ-4576 - https://bugs.opendaylight.org/show_bug.cgi?id=4576) Attaching all the karaf logs of 3 controller nodes and the netstat snaphot of switch connectivity. Thanks & Regards, |
| Comments |
| Comment by Saibal Roy [ 11/Jan/16 ] |
|
Attachment Bugs.rar has been added with description: karaf logs of 3 controller and netstat snapshot of switch connectivity. |
| Comment by Michal Rehak [ 18/Jan/16 ] |
|
Hi, I looked briefly over attached logs and it seems to me that shard members 1 and 2 got stuck while connecting to DS-master. This resulted into many transaction submit failures and finally into "too many files open" exception. By the member-3 there are conflicting transactions trying to write data which renders into optimistic lock exceptions. This looks like there are many reconnects on this node and after each one there is DS cleanup phase which probably went wrong once and from this point on it keeps blocking new connections. Could you retest and check logs before connecting ovs if there are some DS-master connection related issues? Also could you test with 3 devices just to inspect more possible reasons. Thank you. |
| Comment by Vratko Polak [ 19/Jan/16 ] |
|
Off topic: > please do not use rar compression What about tar.xz? |
| Comment by Saibal Roy [ 20/Jan/16 ] |
|
Attachment 20-1-2016.zip has been added with description: logs for switch stability connectivity |
| Comment by Saibal Roy [ 20/Jan/16 ] |
|
Hi, I made 3 controller Up and i saw from the jconsole that member-1 has become the leader..Please find the details in the attached logs. 1. Now i connect 1 switch per controller (Total 3 switches), i could observe that the switch connectivity is not getting lost..Checked for 30 minutes and the switch connectivity was persistent. 2. Again i connected 5 switcher per controller(Total 15 switches) and i could not see the connectivity getting lost.Observed in the karaf logs and everything goes fine.. 3. Now i connect 10 switches per controller(Total 30 switches) and i observed for 1 hour and i could see that the connectivity is getting lost.In the logs also i could see OptimisticLockFailedException. Attaching the logs for more details. Thanks & Regards, |
| Comment by Vaclav Demcak [ 02/Mar/16 ] |
|
Please could you confirm change behavior with actual stable/lithium or stable/beryllium code base? |
| Comment by Muthukumaran Kothandaraman [ 07/Mar/16 ] |
|
Hi Vaclav, We did this on beryllium master as on 11-Jan-2016. Now that we have stable/beryllium and lot of water flowed under bridge since 11-Jan, we will retry this with latest stable/beryllium again and update the status to see how we can take this forward |
| Comment by Muthukumaran Kothandaraman [ 01/Apr/16 ] |
|
Latest Beryllium Stable build was run with JDK1.8 with rest of scenario remaining same as below. Observation : For the same count of 20 switches (OVS 2.3.2), this issue was not observed. Closing this bug. In case we encounter scaling issues in terms of increased number of switches, then we can treat it as separate issue. Hope we are in sync |