[NETVIRT-659] TCP communication is working, even remove the TCP SG rule from the VM instance. Created: 08/May/17 Updated: 03/May/18 Resolved: 05/Apr/18 |
|
| Status: | Resolved |
| Project: | netvirt |
| Component/s: | General |
| Affects Version/s: | Carbon |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Medium |
| Reporter: | Hari Prasidh | Assignee: | Vinoth B |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| External issue ID: | 8400 |
| Description |
|
Setup Used: 1. 3 node ODL cluster + HA proxy Steps followed for testing: Scenario 1: Observations: 2) Observed TCP communication is does not working with a new session for Ingress and Egress as well. Scenario 2: Observations: 2) Observed TCP communication is does not working with a new session for Ingress and Egress as well. |
| Comments |
| Comment by Arthi Bhattacharjee [ 26/May/17 ] |
|
While evaluating the bug, we have found an another issue. |
| Comment by Hari Prasidh [ 30/May/17 ] |
|
Note: I've tested the same scenario with Any SG rule (TCP/ICMP/UDP) and observed the issue. |
| Comment by Aswin Suryanarayanan [ 30/May/17 ] |
|
This is since conntrack entries are not deleted when a SG rule is deleted. We need to find a way to prevent the traffic corresponding the deleted rules not hitting +est. One way to workaround this is add a higher priority drop rule to match the traffic corresponding to deleted rule with a timeout. |
| Comment by Vinoth B [ 15/Jun/17 ] |
|
Observation - 1: Created 2 VMs with ICMP ALL rule.. When ping operation begin between 2 VMs , the below conntrack entry has been created, [1] [root@ocata-compute1 ~]# conntrack -L | grep icmp This entry will be available until we stopped the ping communication. When we remove the ICMP rule from the SG, Ongoing communication is not getting disturbed because the respective conntrack entry [1] is still exists and make the communication to be successful. Observation - 2: Executed the same test case in pure openstack and observered that ongoing communication has been stopped and respective conntrack rule also removed, when we remove the rule. |
| Comment by Aswin Suryanarayanan [ 15/Jun/17 ] |
|
As mentioned earlier we could add a higher priority rule to drop the traffic. The delete action will overwrite the commit flow with a drop flow of higher priority with a timeout. The value for timeout for each protocol can be selected from [1]. [1]https://github.com/opendaylight/netvirt/blob/master/vpnservice/aclservice/impl/src/main/yang/aclservice-config.yang |
| Comment by Vinoth B [ 19/Jun/17 ] |
|
Observation - 3: Setup : pure openstack with OVS firewall, 1. Created 1 VM with ICMP ALL ingress rule and it installs the below specified flows in the flow table. [1] cookie=0xa99821b3245d1aa6, duration=95.226s, table=82, n_packets=23, n_bytes=2254, reset_counts priority=70,ct_state=+est-rel-rpl,icmp,reg5=0x4,dl_dst=fa:16:3e:7c:07:75 actions=pop_vlan,output:4 [2] cookie=0xa99821b3245d1aa6, duration=95.225s, table=82, n_packets=1, n_bytes=98, reset_counts priority=70,ct_state=+new-est,icmp,reg5=0x4,dl_dst=fa:16:3e:7c:07:75 actions=ct(commit,zone=NXM_NX_REG6[0..15]),pop_vlan,output:4 [1] Kernal conntrack entry flow 2. Pinged from DHCP to VM and observed that the first packet is going through the SG flow and further packets are hitting the kernal conntrack entry flow. 3. When we delete the ICMP ingress rule, the on going communication has been stopped because the respective conntrack kernal entry flow is also getting deleted along with the SG flow. But in ODL, since we have only one global Kernal conntrack entry flow for all the VMs, The ongoing communication is still succeeded even though we removed the SG rule from VM. |
| Comment by Vinoth B [ 20/Jun/17 ] |
|
(In reply to Vinoth B from comment #6) Shall i proceed the way how OVS firewall handles this issue? I mean adding the conntrack-Kernel entry per rule instead of common for all the VMs. |
| Comment by Aswin Suryanarayanan [ 20/Jun/17 ] |
|
Adding a +est rule per vm defeat the purpose of using connection tracking in my opinion. An established packet need to traverse through all the rules which is added per vm before we have a hit. I would suggest adding a drop rule on deletion would be better solution. A similar proposal implemented in OVN[1],[2]. [1]https://github.com/openvswitch/ovs/commit/cc58e1f2cc414b844c36d21a9a39e4f3383e75e4 |
| Comment by Vinoth B [ 23/Jun/17 ] |
|
Aswin, The DROP flow fix is fine. Fix Explanation : When we delete a specific SG rule from the VM, netvirt should add the high priority drop rule with the exact match condition with idle timeout. This drop flow will stops the existing communication. Similarly, When we add a same rule again, netvirt should remove the added drop flow to make the existing stopped communication to be resumed. We have to look on below problem before starting the implementation. Apart from this, this approach is fine. Shall | start the implementation for this fix? |
| Comment by Vinoth B [ 10/Jul/17 ] |
|
I have pushed a draft patch to fix this issue. Netvirt: Fix provided: Added a DROP rule with idle_timeout(5 secs) when we delete or remove the SG from the VM. If we add the same rule again to the VM, Netvirt will delete the added DROP rule and make the existing communication to be resume. Limitations : 1) When we add the two similar rules to the VMs (SG1 with ALL ICMP ingress, SG2 with ALL ICMP ingress), there are two ALL ICMP ingress flow will be exists in the flow table with different priority. while removing SG1 from VM, as per the above fix, the DROP rule will be added which stops the existing ICMP communication irrespective of there is another similar rule (SG2) is still exist in the VM. 2) After establishing SSH communication between VMs, TCP doesn't send packets continuously. So the added DROP flow is getting removed from the flow table after reaching the idle timeout 5 secs. Observed the same behavior also for SCP communication |
| Comment by Aswin Suryanarayanan [ 10/Jul/17 ] |
|
The draft is not accessible. Please publish the patch or add reviewers. |
| Comment by Vinoth B [ 09/Aug/17 ] |
|
Since the fix having some limitation, I am working on the new approach which will resolve all the limitation. The new approach involves pipeline changes, So I have pushed the spec patch. |
| Comment by Sam Hague [ 05/Apr/18 ] |