[NETVIRT-108] Mac learning in vlan provider networks clobbers static flows with wrong flows Created: 30/Aug/16  Updated: 29/May/18  Resolved: 02/Sep/16

Status: Resolved
Project: netvirt
Component/s: General
Affects Version/s: Boron
Fix Version/s: None

Type: Bug
Reporter: Josh Hershberg Assignee: Josh Hershberg
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 6584

 Description   

Here are flow installations to node1:
[jhershbe@josh-laptop current]$ grep -E "MOD.* table:51" snoop.control.out
OFPT_FLOW_MOD (OF1.3) (xid=0xa86): ADD table:51 priority=20,metadata=0x1388000000/0xffff000000,dl_dst=fa:16:3e:cb:d1:25 cookie:0x8031388 actions=load:0x500->NXM_NX_REG6[],resubmit(,220)
OFPT_FLOW_MOD (OF1.3) (xid=0xaee): ADD table:51 priority=20,metadata=0x1388000000/0xffff000000,dl_dst=fa:16:3e:87:6e:96 cookie:0x8031388 actions=load:0x600->NXM_NX_REG6[],resubmit(,220)
OFPT_FLOW_MOD (OF1.3) (xid=0xaf7): ADD table:51 priority=20,metadata=0x1388000000/0xffff000000,dl_dst=fa:16:3e:6a:4d:36 cookie:0x8031388 actions=load:0x100->NXM_NX_REG6[],resubmit(,220)
OFPT_FLOW_MOD (OF1.3) (xid=0xaff): ADD table:51 priority=20,metadata=0x1388000000/0xffff000000,dl_dst=fa:16:3e:87:6e:96 cookie:0x8031388 actions=load:0x100->NXM_NX_REG6[],resubmit(,220)
Note the two times the flow for the MAC address ending in :96 is installed. The first time reg6 is tagged as 0x600, which outputs correctly on port 3. The second time reg6 is tagged with 0x100 which sends the packet incorrectly out of port 1, the vlan port. The first, correct flow is installed when the ElanInterface is added. The second, incorrect flow is installed as a result of a PACKET_IN message sent to the controller with a source MAC of :96. How does this happen and why is the wrong flow calculated? Like this:
1) A packet is sent from node1 to the node2 over the provider vlan network
2) That packet is received on the node2, begins to traverse the pipeline, and reaches the smac table (50)
3) That packet has a source address of :96 but does not match any entries in the smac table because :96 is not found on node2, the packet came across the provider network from node1.
4) Table 50's default action (CONTROLLER) fires and the PACKET_IN is sent to the controller
5) On the controller the flow reaches ElanUtils.setupOrigDmacFlows with the macAddress=:96, the interface is the vlan port on the node2, and the dpid is node2. Therefor,...
6) ...when the remote dmac flow is calculated it is essentially asking "where should I send traffic out on node1 to reach the vlan port on node2?". Answer: the vlan port on node1 which gives us the second flow (0x100) which overwrites the original, correct flow.

I checked how this works with VxLan. What I found was that packets coming in off the VxLan interface do not pass through the smac (50) table at all. They are sent straight to 36 (INTERNAL_TUNNEL_TABLE) and from there they go to 220 for output'ing.

So, should Vlan work like vxlan and move the traffic from table:0 to table:36 to table:220? I'm going to try and find the code that handles that for Vxlan and see what I can figure out but wanted to share this analysis and see whether you had any suggestions.



 Comments   
Comment by A H [ 02/Sep/16 ]

Is there an ETA for this bug and someone assigned to fix?

Generated at Wed Feb 07 20:20:43 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.