[OVSDB-309] Arp flows are not getting installed properly in compute nodes Created: 26/Feb/16  Updated: 30/Oct/17  Resolved: 21/Mar/16

Status: Resolved
Project: ovsdb
Component/s: openstack.net-virt
Affects Version/s: unspecified
Fix Version/s: None

Type: Bug
Reporter: Hari Prasidh Assignee: Oded Shvartz
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Zip Archive karaf_logs.zip    
External issue ID: 5435

 Description   

I am analyzing ARP responder feature in ODL using openstack.
Am using one control and 2 compute nodes.
Created two networks (tenant1 and tenant2)

While Adding router interface with tenant1 , arp flows are installed both control and compute nodes as expected for DHCP port and default gateway.

And then, if I added router interface with tenant2, in control node arp flows are installed as expected, but, in compute nodes teanant1 default gateway IP is overwritten with teanant2 default gateway IP.

Please find below dump flows:

Added Router interface with tenant1 (10.0.0.0/24)

Control node :

cookie=0x0, duration=496.642s, table=0, n_packets=3, n_bytes=258, priority=0 actions=goto_table:20
cookie=0x0, duration=76.476s, table=20, n_packets=0, n_bytes=0, priority=1024,arp,tun_id=0x412,arp_tpa=10.0.0.2,arp_op=1 actions=move:NXM_OF_ETH_SRC[]>NXM_OF_ETH_DST[],set_field:fa:16:3e:a4:7b:03>eth_src,load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]>NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]>NXM_OF_ARP_TPA[],load:0xfa163ea47b03->NXM_NX_ARP_SHA[],load:0xa000002->NXM_OF_ARP_SPA[],IN_PORT
cookie=0x0, duration=45.164s, table=20, n_packets=0, n_bytes=0, priority=1024,arp,tun_id=0x400,arp_tpa=20.0.0.2,arp_op=1 actions=move:NXM_OF_ETH_SRC[]>NXM_OF_ETH_DST[],set_field:fa:16:3e:3c:f1:d5>eth_src,load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]>NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]>NXM_OF_ARP_TPA[],load:0xfa163e3cf1d5->NXM_NX_ARP_SHA[],load:0x14000002->NXM_OF_ARP_SPA[],IN_PORT
cookie=0x0, duration=2.998s, table=20, n_packets=0, n_bytes=0, priority=1024,arp,tun_id=0x412,arp_tpa=10.0.0.1,arp_op=1 actions=move:NXM_OF_ETH_SRC[]>NXM_OF_ETH_DST[],set_field:fa:16:3e:a4:bf:1b>eth_src,load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]>NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]>NXM_OF_ARP_TPA[],load:0xfa163ea4bf1b->NXM_NX_ARP_SHA[],load:0xa000001->NXM_OF_ARP_SPA[],IN_PORT
cookie=0x0, duration=496.642s, table=20, n_packets=16, n_bytes=1296, priority=0 actions=goto_table:30

Compute1:

cookie=0x0, duration=314.362s, table=0, n_packets=10, n_bytes=788, priority=0 actions=goto_table:20
cookie=0x0, duration=8.498s, table=20, n_packets=0, n_bytes=0, priority=1024,arp,tun_id=0x412,arp_tpa=10.0.0.1,arp_op=1 actions=move:NXM_OF_ETH_SRC[]>NXM_OF_ETH_DST[],set_field:fa:16:3e:a4:bf:1b>eth_src,load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]>NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]>NXM_OF_ARP_TPA[],load:0xfa163ea4bf1b->NXM_NX_ARP_SHA[],load:0xa000001->NXM_OF_ARP_SPA[],IN_PORT
cookie=0x0, duration=314.362s, table=20, n_packets=10, n_bytes=788, priority=0 actions=goto_table:30

Compute2:

cookie=0x0, duration=313.912s, table=0, n_packets=10, n_bytes=788, priority=0 actions=goto_table:20
cookie=0x0, duration=14.214s, table=20, n_packets=0, n_bytes=0, priority=1024,arp,tun_id=0x412,arp_tpa=10.0.0.1,arp_op=1 actions=move:NXM_OF_ETH_SRC[]>NXM_OF_ETH_DST[],set_field:fa:16:3e:a4:bf:1b>eth_src,load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]>NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]>NXM_OF_ARP_TPA[],load:0xfa163ea4bf1b->NXM_NX_ARP_SHA[],load:0xa000001->NXM_OF_ARP_SPA[],IN_PORT
cookie=0x0, duration=313.912s, table=20, n_packets=10, n_bytes=788, priority=0 actions=goto_table:30

Added Router interface with tenant2 (20.0.0.0/24)

Control:

cookie=0x0, duration=591.999s, table=0, n_packets=3, n_bytes=258, priority=0 actions=goto_table:20
cookie=0x0, duration=140.521s, table=20, n_packets=0, n_bytes=0, priority=1024,arp,tun_id=0x400,arp_tpa=20.0.0.2,arp_op=1 actions=move:NXM_OF_ETH_SRC[]>NXM_OF_ETH_DST[],set_field:fa:16:3e:3c:f1:d5>eth_src,load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]>NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]>NXM_OF_ARP_TPA[],load:0xfa163e3cf1d5->NXM_NX_ARP_SHA[],load:0x14000002->NXM_OF_ARP_SPA[],IN_PORT
cookie=0x0, duration=171.833s, table=20, n_packets=0, n_bytes=0, priority=1024,arp,tun_id=0x412,arp_tpa=10.0.0.2,arp_op=1 actions=move:NXM_OF_ETH_SRC[]>NXM_OF_ETH_DST[],set_field:fa:16:3e:a4:7b:03>eth_src,load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]>NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]>NXM_OF_ARP_TPA[],load:0xfa163ea47b03->NXM_NX_ARP_SHA[],load:0xa000002->NXM_OF_ARP_SPA[],IN_PORT
cookie=0x0, duration=98.355s, table=20, n_packets=0, n_bytes=0, priority=1024,arp,tun_id=0x412,arp_tpa=10.0.0.1,arp_op=1 actions=move:NXM_OF_ETH_SRC[]>NXM_OF_ETH_DST[],set_field:fa:16:3e:a4:bf:1b>eth_src,load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]>NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]>NXM_OF_ARP_TPA[],load:0xfa163ea4bf1b->NXM_NX_ARP_SHA[],load:0xa000001->NXM_OF_ARP_SPA[],IN_PORT
cookie=0x0, duration=8.945s, table=20, n_packets=0, n_bytes=0, priority=1024,arp,tun_id=0x400,arp_tpa=20.0.0.1,arp_op=1 actions=move:NXM_OF_ETH_SRC[]>NXM_OF_ETH_DST[],set_field:fa:16:3e:e6:97:f1>eth_src,load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]>NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]>NXM_OF_ARP_TPA[],load:0xfa163ee697f1->NXM_NX_ARP_SHA[],load:0x14000001->NXM_OF_ARP_SPA[],IN_PORT
cookie=0x0, duration=591.999s, table=20, n_packets=16, n_bytes=1296, priority=0 actions=goto_table:30

Compute1:

cookie=0x0, duration=427.876s, table=0, n_packets=10, n_bytes=788, priority=0 actions=goto_table:20
cookie=0x0, duration=32.892s, table=20, n_packets=0, n_bytes=0, priority=1024,arp,tun_id=0x400,arp_tpa=20.0.0.1,arp_op=1 actions=move:NXM_OF_ETH_SRC[]>NXM_OF_ETH_DST[],set_field:fa:16:3e:e6:97:f1>eth_src,load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]>NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]>NXM_OF_ARP_TPA[],load:0xfa163ee697f1->NXM_NX_ARP_SHA[],load:0x14000001->NXM_OF_ARP_SPA[],IN_PORT
cookie=0x0, duration=427.876s, table=20, n_packets=10, n_bytes=788, priority=0 actions=goto_table:30

Compute2:

cookie=0x0, duration=450.695s, table=0, n_packets=10, n_bytes=788, priority=0 actions=goto_table:20
cookie=0x0, duration=61.874s, table=20, n_packets=0, n_bytes=0, priority=1024,arp,tun_id=0x400,arp_tpa=20.0.0.1,arp_op=1 actions=move:NXM_OF_ETH_SRC[]>NXM_OF_ETH_DST[],set_field:fa:16:3e:e6:97:f1>eth_src,load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]>NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]>NXM_OF_ARP_TPA[],load:0xfa163ee697f1->NXM_NX_ARP_SHA[],load:0x14000001->NXM_OF_ARP_SPA[],IN_PORT
cookie=0x0, duration=450.695s, table=20, n_packets=10, n_bytes=788, priority=0 actions=goto_table:30



 Comments   
Comment by Hari Prasidh [ 26/Feb/16 ]

Attachment karaf_logs.zip has been added with description: karaf_logs

Comment by Anil Vishnoi [ 28/Feb/16 ]

just a wild guess is that net-virt is using the overlapping flow names, so when it tries to install the flow for tenant 2, it generates the same flow name and end up modifying the existing flow rather then installing new flow.

Comment by Oded Shvartz [ 29/Feb/16 ]

This issue related to the fact that before we creates ARP responder rules we validate that the rules are needed on the specific node and if not we delete it :

DistributedArpService:

// Arp rule is only needed when segmentation exists in the given node (OVSDB-235).
boolean arpNeeded = tenantNetworkManager.isTenantNetworkPresentInNode(node, providerSegmentationId);
final Action actionForNode = arpNeeded ? actionToPerform : Action.DELETE;
...

Once we will have the network on the node (first VM will be created) the ARP responder rules will be created (for the DHCP port and for the router interface).

Now the question is if we want to fix it by not having the new rule as well ?

Comment by Oded Shvartz [ 01/Mar/16 ]

(In reply to Oded Shvartz from comment #2)
> This issue related to the fact that before we creates ARP responder rules we
> validate that the rules are needed on the specific node and if not we delete
> it :
>
> DistributedArpService:
>
> // Arp rule is only needed when segmentation exists in the given node (bug
> 4752).
> boolean arpNeeded = tenantNetworkManager.isTenantNetworkPresentInNode(node,
> providerSegmentationId);
> final Action actionForNode = arpNeeded ? actionToPerform : Action.DELETE;
> ...
>
> Once we will have the network on the node (first VM will be created) the ARP
> responder rules will be created (for the DHCP port and for the router
> interface).
>
>
> Now the question is if we want to fix it by not having the new rule as well ?

So after more debugging I found why the new rule (after adding the second interface to the router) was not deleted like the first interface.

After calling the distributedArpService.handlePortEvent (where we delete all the rules that are not needed to a specific node) we then calling the neutronL3Adapter.handleNeutronPortEvent and there we are not checking if the rule is needed, because we assume that ARP responder rules for router interface are needed always :

// Enable ARP responder by default, because router interface needs to be responded always.
distributedArpService.programStaticRuleStage1(dpid, destinationSegmentationId, macAddress, ipStr, actionForNode);

From functionality perspective I don't think we need it (I hope I'm not missing something here), because once we will have the first VM for this network we will create it, but I think we should be more consistent and to decide if we want to create those rules or not in both places.

Comment by Hari Prasidh [ 01/Mar/16 ]

Hi Oded,

Assume we have 2 networks(tenat1 and tenant2) and 2 Vm instances in control node, one instance in compute1 and one instance in compute2,
Control node Vms
VM1(10.0.0.3)
VM2(20.0.0.3)

Compute1 node Vm
VM3(10.0.0.4)

Compute2 node Vm
VM4(20.0.0.4)

now, if we are trying to ping VM4(20.0.0.4) to VM1(10.0.0.3). If compute2 not having Arp flows for default gateway ip, then the Arp request will be broad casted to all nodes .

So I think by default arp responder should be enable for router interface.

Comment by Oded Shvartz [ 02/Mar/16 ]

(In reply to hari prasad from comment #4)
> Hi Oded,
>
> Assume we have 2 networks(tenat1 and tenant2) and 2 Vm instances in control
> node, one instance in compute1 and one instance in compute2,
> Control node Vms
> VM1(10.0.0.3)
> VM2(20.0.0.3)
>
> Compute1 node Vm
> VM3(10.0.0.4)
>
> Compute2 node Vm
> VM4(20.0.0.4)
>
> now, if we are trying to ping VM4(20.0.0.4) to VM1(10.0.0.3). If compute2
> not having Arp flows for default gateway ip, then the Arp request will be
> broad casted to all nodes .
>
> So I think by default arp responder should be enable for router interface.

Hi Hari,

I think that when VM4(20.0.0.4) will try to ping VM1(10.0.0.3) it will send ARP request to find the default gateway on network 20.0.0.0 and we will have the arp responder rule for it, because it will be added once you created the VM (VM4).

VM4 will send the packet to the default gateway (OVS) and OVS will do the routing without sending ARP request. You can look on table 70 which will add the relevant mac for the requested IP (10.0.0.4 in your case).

Please let me know what do you think.

Comment by Sam Hague [ 21/Mar/16 ]

be: https://git.opendaylight.org/gerrit/#/c/36268/1
master: https://git.opendaylight.org/gerrit/#/c/36231/

Generated at Wed Feb 07 20:36:03 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.