[OVSDB-253] Arp responder (and Icmp echo responder) rules not removed properly from ovs (L3 DVR) Created: 10/Jan/16  Updated: 29/May/18  Resolved: 09/Mar/16

Status: Resolved
Project: ovsdb
Component/s: openstack.net-virt
Affects Version/s: unspecified
Fix Version/s: None

Type: Bug
Reporter: Josh Hershberg Assignee: Josh Hershberg
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Zip Archive dump_flows&karaf_log.zip    
External issue ID: 4917
Priority: High

 Description   

After the router interface and the router is deleted "ovs-ofctl -O OpenFlow13 dump-flows br-int" shows the rules still present.
Note that I observed this while adding the ICMP echo rules but validated that this occurs without those rules as well.



 Comments   
Comment by Flavio Fernandes [ 11/Jan/16 ]

this is duplicate of OVSDB-242 ?

Comment by Sam Hague [ 12/Jan/16 ]

(In reply to Flavio Fernandes from comment #1)
> this is duplicate of OVSDB-242 ?

Possibly related but different. 4844 is br-ex has gone away but the arp responder code is still trying to arp for a gateway. This bug here is that there are still flows on br-ex that should have been removed when the neutron router interface was deleted. Both outcomes might be related to some cleanup code.

Comment by Flavio Fernandes [ 13/Jan/16 ]

(In reply to Sam Hague from comment #2)
> (In reply to Flavio Fernandes from comment #1)
> > this is duplicate of OVSDB-242 ?
>
> Possibly related but different. 4844 is br-ex has gone away but the arp
> responder code is still trying to arp for a gateway. This bug here is that
> there are still flows on br-ex that should have been removed when the
> neutron router interface was deleted. Both outcomes might be related to some
> cleanup code.

ah, ok. the logic for stopping the arp responder is triggered by the
neutron port removal, so I'm thinking this is a dup. Let's revisit this
once 4844 is fixed.

Comment by Sam Hague [ 02/Feb/16 ]

Josh, is this bug fixed?

Comment by Hari Prasidh [ 05/Feb/16 ]

From our observation, When Router is deleted
NeutronL3Adapter receives the delete event and the Arp flow for the default gateway IP is deleted.

When tenant network is deleted
Two events are received, an update event followed by the delete event at DistributedArpService and L3NeutronAdapter for the dhcp port.
The Arp flows for the dhcp port is deleted as part of the delete event.

In L3NeutronAdapter as part of the update event its observed that the Arp flows are written for the default gateway IP.
This resulted in the flows to be rewritten which was deleted as part of the Router delete operation.
The update event was received for the DHCP port, but the ARP flow was created for the default gateway IP.

Dump flows for the above operations are given below

Dump flows before router and tenant delete operation.

cookie=0x0, duration=309.454s, table=20, n_packets=0, n_bytes=0, priority=1024,arp,tun_id=0x419,arp_tpa=20.0.0.2,arp_op=1 actions=move:NXM_OF_ETH_SRC[]>NXM_OF_ETH_DST[],set_field:fa:16:3e:fd:cd:7a>eth_src,load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]>NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]>NXM_OF_ARP_TPA[],load:0xfa163efdcd7a->NXM_NX_ARP_SHA[],load:0x14000002->NXM_OF_ARP_SPA[],IN_PORT
cookie=0x0, duration=59.004s, table=20, n_packets=0, n_bytes=0, priority=1024,arp,tun_id=0x411,arp_tpa=10.0.0.1,arp_op=1 actions=move:NXM_OF_ETH_SRC[]>NXM_OF_ETH_DST[],set_field:fa:16:3e:e8:36:09>eth_src,load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]>NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]>NXM_OF_ARP_TPA[],load:0xfa163ee83609->NXM_NX_ARP_SHA[],load:0xa000001->NXM_OF_ARP_SPA[],IN_PORT
cookie=0x0, duration=402.600s, table=20, n_packets=0, n_bytes=0, priority=1024,arp,tun_id=0x411,arp_tpa=10.0.0.2,arp_op=1 actions=move:NXM_OF_ETH_SRC[]>NXM_OF_ETH_DST[],set_field:fa:16:3e:37:4f:cb>eth_src,load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]>NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]>NXM_OF_ARP_TPA[],load:0xfa163e374fcb->NXM_NX_ARP_SHA[],load:0xa000002->NXM_OF_ARP_SPA[],IN_PORT
cookie=0x0, duration=41.220s, table=20, n_packets=0, n_bytes=0, priority=1024,arp,tun_id=0x419,arp_tpa=20.0.0.1,arp_op=1 actions=move:NXM_OF_ETH_SRC[]>NXM_OF_ETH_DST[],set_field:fa:16:3e:68:80:e1>eth_src,load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]>NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]>NXM_OF_ARP_TPA[],load:0xfa163e6880e1->NXM_NX_ARP_SHA[],load:0x14000001->NXM_OF_ARP_SPA[],IN_PORT

Dump flows after router delete operation

cookie=0x0, duration=763.474s, table=20, n_packets=0, n_bytes=0, priority=1024,arp,tun_id=0x419,arp_tpa=20.0.0.2,arp_op=1 actions=move:NXM_OF_ETH_SRC[]>NXM_OF_ETH_DST[],set_field:fa:16:3e:fd:cd:7a>eth_src,load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]>NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]>NXM_OF_ARP_TPA[],load:0xfa163efdcd7a->NXM_NX_ARP_SHA[],load:0x14000002->NXM_OF_ARP_SPA[],IN_PORT
cookie=0x0, duration=856.620s, table=20, n_packets=0, n_bytes=0, priority=1024,arp,tun_id=0x411,arp_tpa=10.0.0.2,arp_op=1 actions=move:NXM_OF_ETH_SRC[]>NXM_OF_ETH_DST[],set_field:fa:16:3e:37:4f:cb>eth_src,load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]>NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]>NXM_OF_ARP_TPA[],load:0xfa163e374fcb->NXM_NX_ARP_SHA[],load:0xa000002->NXM_OF_ARP_SPA[],IN_PORT

dump flow after tenant delete operation.

Deleted tenant in 10.0.0.X subnet, The DHCP arp flows for DHCP IP 10.0.0.2 is removed but the arp flows for default gateway IP 10.0.0.1 is added again.

cookie=0x0, duration=812.489s, table=20, n_packets=0, n_bytes=0, priority=1024,arp,tun_id=0x419,arp_tpa=20.0.0.2,arp_op=1 actions=move:NXM_OF_ETH_SRC[]>NXM_OF_ETH_DST[],set_field:fa:16:3e:fd:cd:7a>eth_src,load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]>NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]>NXM_OF_ARP_TPA[],load:0xfa163efdcd7a->NXM_NX_ARP_SHA[],load:0x14000002->NXM_OF_ARP_SPA[],IN_PORT
cookie=0x0, duration=10.092s, table=20, n_packets=0, n_bytes=0, priority=1024,arp,tun_id=0x411,arp_tpa=10.0.0.1,arp_op=1 actions=move:NXM_OF_ETH_SRC[]>NXM_OF_ETH_DST[],set_field:fa:16:3e:e8:36:09>eth_src,load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]>NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]>NXM_OF_ARP_TPA[],load:0xfa163ee83609->NXM_NX_ARP_SHA[],load:0xa000001->NXM_OF_ARP_SPA[],IN_PORT

Our understanding is that the cache is not cleared correctly during the delete event handling of the default gateway IP.
Would like to have your suggestions so that we can work on resolving this bug.

Comment by Sam Hague [ 05/Feb/16 ]

I think this is because of gatewayMacResolved() or related. That gets triggered from packet-ins. So that is called async of the delete events. Eventually that methods goes to write the ARP flow. We should probably have a check before writing the flow to ensure that there is still a network, if not then don't write the flow. We also might want to remove the ARPing for the gateway if it is no longer needed - if there are no more networks using it.

Another place to look is the doNeutronNetworkDeleted() method. That method was added before we had the caches, so I wondr if that is no longer needed since the caches keep track of the async events. This may not be related to the error in this bug but we probably should still look at if this is needed.

Comment by Josh Hershberg [ 08/Feb/16 ]

OK. Here's what I've found.

When you remove the router interface the ARP and ICMP flows are successfully removed from OVS...BUT they are added again if you delete the subnet because the router is still in subnetIdToRouterInterfaceCache.

Here are the details:

1) When you remove the router interface the local caches are not properly cleaned. This is due to the following code which is invoked at the end of handleNeutronPortEvent:
private void cleanupRouterCache(final NeutronRouter_Interface neutronRouterInterface) {
/*

  • Fix for 4277
  • Remove the router cache only after deleting the neutron
  • port l3 flows.
    */
    final NeutronPort neutronPort = neutronPortCache.getPort(neutronRouterInterface.getPortUUID());

if (neutronPort != null)

{ networkIdToRouterMacCache.remove(neutronPort.getNetworkUUID()); networkIdToRouterIpListCache.remove(neutronPort.getNetworkUUID()); subnetIdToRouterInterfaceCache.remove(neutronRouterInterface.getSubnetUUID()); }

}
neutronPort is always null because the port is removed from the model in neutron northbound. As such, the various caches are never cleared.

2) When you issue a "neutron subnet-delete" command neutron starts by issuing an update to the ports on that subnet before the subnet is deleted.

3) in the function handleNeutronPortEvent the following loop causes the flows to be rewritten:
for (Neutron_IPs neutronIP : neutronPort.getFixedIPs()) {
NeutronRouter_Interface neutronRouterInterface =
subnetIdToRouterInterfaceCache.get(neutronIP.getSubnetUUID());
if (neutronRouterInterface != null)

{ this.handleNeutronRouterInterfaceEvent(null /*neutronRouter*/, neutronRouterInterface, action); }

}
In the function handleNeutronRouterInterfaceEvent the flows are re-configured.

Comment by Josh Hershberg [ 08/Feb/16 ]

UPDATE: This issue is resolved by undoing this change

https://github.com/stackforge/networking-odl

which confirms the analysis I posted above.

However, there are still a few wrinkles.

1) ArpResponder for the DHCP addresses remain in the MdSal but are not in OVS. After restarting OVS, this disappers.

2) L3Forwarding flows are present in both mdsal and OVS

3) LocalTableMiss and TunnelMiss are still present in mdsal and OVS.

I suspect these are unrelated issues but want to make sure.

Comment by Josh Hershberg [ 09/Feb/16 ]

Sorry the commit this reverts is https://git.opendaylight.org/gerrit/#/c/27521/

Comment by Hari Prasidh [ 12/Feb/16 ]

Hi,

I tested with below patch which is recently merged.
https://git.opendaylight.org/gerrit/#/c/34308/ .

Here I noticed arp entries are still not deleted for DHCP port in control node even we deleted router and network.

PFB dump flows:

Add network (tenant1) :

cookie=0x0, duration=83.198s, table=20, n_packets=0, n_bytes=0, priority=1024,arp,tun_id=0x434,arp_tpa=10.0.0.2,arp_op=1 actions=move:NXM_OF_ETH_SRC[]>NXM_OF_ETH_DST[],set_field:fa:16:3e:0f:db:b9>eth_src,load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]>NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]>NXM_OF_ARP_TPA[],load:0xfa163e0fdbb9->NXM_NX_ARP_SHA[],load:0xa000002->NXM_OF_ARP_SPA[],IN_PORT
cookie=0x0, duration=3034.459s, table=20, n_packets=16, n_bytes=1296, priority=0 actions=goto_table:30

Add routerinterface with tenant1 :

cookie=0x0, duration=117.513s, table=20, n_packets=0, n_bytes=0, priority=1024,arp,tun_id=0x434,arp_tpa=10.0.0.2,arp_op=1 actions=move:NXM_OF_ETH_SRC[]>NXM_OF_ETH_DST[],set_field:fa:16:3e:0f:db:b9>eth_src,load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]>NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]>NXM_OF_ARP_TPA[],load:0xfa163e0fdbb9->NXM_NX_ARP_SHA[],load:0xa000002->NXM_OF_ARP_SPA[],IN_PORT
cookie=0x0, duration=21.115s, table=20, n_packets=0, n_bytes=0, priority=1024,arp,tun_id=0x434,arp_tpa=10.0.0.1,arp_op=1 actions=move:NXM_OF_ETH_SRC[]>NXM_OF_ETH_DST[],set_field:fa:16:3e:37:b8:cf>eth_src,load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]>NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]>NXM_OF_ARP_TPA[],load:0xfa163e37b8cf->NXM_NX_ARP_SHA[],load:0xa000001->NXM_OF_ARP_SPA[],IN_PORT
cookie=0x0, duration=3068.774s, table=20, n_packets=16, n_bytes=1296, priority=0 actions=goto_table:30

Delete router interface:

cookie=0x0, duration=257.230s, table=20, n_packets=0, n_bytes=0, priority=1024,arp,tun_id=0x434,arp_tpa=10.0.0.2,arp_op=1 actions=move:NXM_OF_ETH_SRC[]>NXM_OF_ETH_DST[],set_field:fa:16:3e:0f:db:b9>eth_src,load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]>NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]>NXM_OF_ARP_TPA[],load:0xfa163e0fdbb9->NXM_NX_ARP_SHA[],load:0xa000002->NXM_OF_ARP_SPA[],IN_PORT
cookie=0x0, duration=3208.491s, table=20, n_packets=16, n_bytes=1296, priority=0 actions=goto_table:30

Delete network(tenant1) :

cookie=0x0, duration=291.280s, table=20, n_packets=0, n_bytes=0, priority=1024,arp,tun_id=0x434,arp_tpa=10.0.0.2,arp_op=1 actions=move:NXM_OF_ETH_SRC[]>NXM_OF_ETH_DST[],set_field:fa:16:3e:0f:db:b9>eth_src,load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]>NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]>NXM_OF_ARP_TPA[],load:0xfa163e0fdbb9->NXM_NX_ARP_SHA[],load:0xa000002->NXM_OF_ARP_SPA[],IN_PORT
cookie=0x0, duration=3242.541s, table=20, n_packets=16, n_bytes=1296, priority=0 actions=goto_table:30

Comment by Sam Hague [ 12/Feb/16 ]

Hari,

could you try the same tests using a latest build from stable/beryllium? We also found that the patch you mentioned didn't solve the problem completely. There were two other patches that when all three together looked to clean up the flows.

Comment by Sam Hague [ 12/Feb/16 ]

(In reply to Sam Hague from comment #11)
> Hari,
>
> could you try the same tests using a latest build from stable/beryllium? We
> also found that the patch you mentioned didn't solve the problem completely.
> There were two other patches that when all three together looked to clean up
> the flows.

Hari, also could you describe your test setup - separate openstack control nodes and compute nodes, control+compute on node with a second compute node, etc?

Comment by Hari Prasidh [ 15/Feb/16 ]

Attachment dump_flows&karaf_log.zip has been added with description: karaf logs, local conf and dump flows for your reference

Comment by Hari Prasidh [ 15/Feb/16 ]

Hi Sam,

I tested with latest build which from stable/beryllium, and I noticed still arp flows are not cleared from table 20.

Am using one control and 2 compute nodes. I've attached local conf for control and compute nodes.

I deleted networks and router interface but still arp flows are not cleared in Control node and compute nodes.

I've attached karaf logs and a doc which having dump flows for all 3 nodes.

Comment by Josh Hershberg [ 15/Feb/16 ]

Just to make sure, are they ARP flows for routers or VMs? The fix in this bug relates to routers.

(In reply to hari prasad from comment #14)
> Hi Sam,
>
> I tested with latest build which from stable/beryllium, and I noticed still
> arp flows are not cleared from table 20.
>
> Am using one control and 2 compute nodes. I've attached local conf for
> control and compute nodes.
>
> I deleted networks and router interface but still arp flows are not cleared
> in Control node and compute nodes.
>
> I've attached karaf logs and a doc which having dump flows for all 3 nodes.

Comment by Hari Prasidh [ 15/Feb/16 ]

Those arp flows are(In reply to Joshua from comment #15)
> Just to make sure, are they ARP flows for routers or VMs? The fix in this
> bug relates to routers.
>
> (In reply to hari prasad from comment #14)
> > Hi Sam,
> >
> > I tested with latest build which from stable/beryllium, and I noticed still
> > arp flows are not cleared from table 20.
> >
> > Am using one control and 2 compute nodes. I've attached local conf for
> > control and compute nodes.
> >
> > I deleted networks and router interface but still arp flows are not cleared
> > in Control node and compute nodes.
> >
> > I've attached karaf logs and a doc which having dump flows for all 3 nodes.

Hi JOshua,
Those arp flows are related to DHCP port. you can find dump flows in attached doc.

Comment by Josh Hershberg [ 23/Feb/16 ]

The remaining issue, the ARP flows for DHCP ports is really a separate issue. Opening a new bug for that one Bug-5408

Comment by Josh Hershberg [ 23/Feb/16 ]

New bug opened for last lingering ARP flows which are really a different bug. See Bug-5408

Comment by Sam Hague [ 23/Feb/16 ]

(In reply to Joshua from comment #18)
> New bug opened for last lingering ARP flows which are really a different
> bug. See Bug-5408

Josh, can this bug be closed now?

Comment by Hari Prasidh [ 04/Mar/16 ]

Hi,

I tested with latest build , the bug is fixed for ARP Responder service.

Can we close this bug ?
or
Is there any bug fix need for ICMP echo responder ?

Generated at Wed Feb 07 20:35:54 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.