[OPNFLWPLUG-290] Flows failed to get install in the same openflow table when programmed from different Threads simultaneously. Created: 20/Sep/14 Updated: 27/Sep/21 Resolved: 19/Feb/15 |
|
| Status: | Resolved |
| Project: | OpenFlowPlugin |
| Component/s: | General |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Madhu Venugopal | Assignee: | Madhu Venugopal |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| Issue Links: |
|
||||||||
| External issue ID: | 1997 | ||||||||
| Description |
|
When an openflow node connects to the controller, ovsdb's net-virt-provider bundle programs the switch with a set of pipeline flows (From Table 0 -> 110 in increments of 10, except Table 10 and 70). These pipeline flows are vital for openstack operation. But, randomly some of the flows fails to get programmed. 1 of the cases that we narrowed down is on Table 0 flow. This randomness happens based on the timing of another flow being installed from another thread. This is the LLDP flow programmed by another thread. When these 2 threads happen to operate simultaneously (or almost), one of the flows gets lost somewhere in the system. Actual flows programmed in OVS : https://gist.github.com/05b25649edd8ec6fa91c The strangeness of the situation is that, the Table0 pipeline flow (DEFAULT_PIPELINE_FLOW_0) that is missing in the actual flow-table in OVS is seen in the config tree & operational tree. Config : http://localhost:8181/restconf/config/opendaylight-inventory:nodes Operational : http://localhost:8181/restconf/operational/opendaylight-inventory:nodes => https://gist.github.com/bfc9e48b3d4fde91cb11 |
| Comments |
| Comment by Madhu Venugopal [ 20/Sep/14 ] |
|
Reproduction steps : |
| Comment by Ed Warnicke [ 20/Sep/14 ] |
|
Madhu, Question... are you guys writing only to the config tree? I ask, because writing to the operational tree is normally done by the statsmanager in response to the switch reporting back the flow. The only way off the top of my head I could think of that we would see the flow in the operational tree if it wasn't on the switch is if you guys are writing to both config and operational store (not accusing, just trying to eliminate possibilities). |
| Comment by Madhu Venugopal [ 20/Sep/14 ] |
|
OVSDB debug logs : https://gist.github.com/17f081e1dfe37c9ee56d CheckedFuture<Void, TransactionCommitFailedException> commitFuture = modification.submit(); catch (Exception e) { logger.error(e.getMessage(), e); modification.cancel(); } |
| Comment by Madhu Venugopal [ 20/Sep/14 ] |
|
(In reply to Ed Warnicke from comment #2) Yes. we write ONLY to the config tree. > I know, that is why I don't know what to do with this issue |
| Comment by Tony Tkacik [ 20/Sep/14 ] |
|
Quick questions: Are you able to see all your flows in configuration store? IF yes, this is bug of Openflowplugin/FRM/Stats Manager and not MD-SAL Infrastructure. This multiple threads are writing to datastore? E.g callback from datastore and LLDP discovery thread? |
| Comment by Madhu Venugopal [ 20/Sep/14 ] |
|
Reproduced again.. this time with pcap : Flows : https://gist.github.com/1926c569ff46bfcfac74 Config : https://gist.github.com/aaaa5d72a499fca19127 Operational : https://gist.github.com/592be004ee3fd21dc1c8 OVSDB Debug logs : https://gist.github.com/e31255c6ea9a82196e45 PCAP : https://www.dropbox.com/s/9rgolqu0x7enc9p/pipeline_flows_2.pcap?dl=0 |
| Comment by Madhu Venugopal [ 20/Sep/14 ] |
|
(In reply to Tony Tkacik from comment #5) > IF yes, this is bug of Openflowplugin/FRM/Stats Manager and not MD-SAL Could be. But when am opening the bug, it is difficult for me to spot it and hence opened it against MD-SAL. Please move it to appropriate component once the > Both these threads are internal to OVSDB codebase. > |
| Comment by Keith Burns [ 21/Sep/14 ] |
|
Was requested by Ed Warnicke to provide some data on this bug from the GroupBasedPolicy ProofOfConcept perspective, which we believe is hitting the same issue. The net that we have discovered is that flows are missing in our Proof of Concept (POC) which involves two OVS instances (each on a separate VM). Essentially, VM:odlgbp1 runs the OpenDaylight controller AND an OVS instance odlgbp2 runs an OVS instance (both in mininet). OVA's of both these VMs are available in my DropBox and access can be granted on request and Thomas and I are also available to share our test environment and run additional tests. Some of our debug steps have been captured in the following google doc (please note, this is our "scratchpad" so we can document some troubleshooting history, not a clean definitive document) : To simplify: 1. We see the requisite expected flows via REST. ASSUMPTION: Flows are correctly placed into the CONFIG datastore. Per Thomas' comments below: The net of this is that it appears that the flows make it to openflowjava, but get lost somewhere thereafter. From Thomas Bachman: configstore/ The configstore dir contains the result of REST queries to /restconf/config/opendaylight-inventory:nodes/node/openflow:<n>. The jconsole directory contains a text file of the stats from the msg-spy-service-impl RuntimeBean. The karaf-log directory contains the karaf log files obtained during the run (note: things to look for are "xid" for openflowjava and "Calling the FlowMod RPC method on MessageDispatchService AddFlow" for openflowplugin). The ovs-logs dir contains the result of running ovs-ofctl The wireshark dir contains the pcapng file and a text file containing the openflow messages seen on the wire between the two vSwitches. The net of this is that it appears that the flows make it to openflowjava, but get lost somewhere thereafter. |
| Comment by Ed Warnicke [ 21/Sep/14 ] |
|
Madhu, Could you attach: a) Your config datastore output from RESTCONF to the bug so we can look at that case? |
| Comment by Madhu Venugopal [ 21/Sep/14 ] |
|
(In reply to Ed Warnicke from comment #9) I already did that yesterday : |
| Comment by Ed Warnicke [ 21/Sep/14 ] |
|
Madhu, Apologies |
| Comment by Keith Burns [ 21/Sep/14 ] |
|
> The wireshark dir contains the pcapng file and a text file containing the Applying following filter: and manually counting the expanded OpenFlow 1.3 messages in each TCP message we get 116 across both switches. ie. Filtering on FlowMODs (14) and also the priorities we set in the GBP project, we should have 114 flows (57 per OVS instances). We are still tracking down the extra 1 flow per switch but suspect it is administration outside of our project. BUT Executing <sudo ovs-ofctl dump-flows <switch> -O Openflow13 | wc -l > on each OVS we get: This is perplexing as it would appear we are getting 116 flowmods (as per filter above) on the wire but only 112 flows total installed. |
| Comment by Michal Rehak [ 22/Sep/14 ] |
|
moved barrier after message: |
| Comment by Ed Warnicke [ 22/Sep/14 ] |
| Comment by Michal Rehak [ 23/Sep/14 ] |
|
Mentioned patch eliminates collisions during add session and remove session phases. Those can occur simultaneously and a collision usually results into NPE in remove session thread and registered but invalid session. Those can cause message lost and failing rpcs. Please retest and eventually mark as resolved. |