[GENIUS-28] VxLAN Tunnel endpoints sometimes missing Created: 17/Oct/16  Updated: 30/Oct/17  Resolved: 05/Dec/16

Status: Resolved
Project: genius
Component/s: General
Affects Version/s: (unspecified)
Fix Version/s: None

Type: Bug
Reporter: Romanos Skiadas Assignee: Hema Gopalakrishnan
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Zip Archive autotunnels.zip     Text File compute1.txt     Text File compute2.txt     Text File controller.txt     File itm-transport-zones_after.xml     File itm-transport-zones_before.xml     Text File karaf.log     HTML File karaf_logaa     HTML File karaf_logab     HTML File karaf_logac     HTML File ovs_logs    
Issue Links:
Duplicate
is duplicated by GENIUS-30 Auto tunnels feature doesn't config a... Resolved
External issue ID: 6952

 Description   

In an openstack-odl setup with one openstack/odl controller and two computes, compute-A is properly connected to the controller but B is not. The tunnel between the computes is also missing.
As a result instances deployed on A can get an IP but instances on B cannot, even though they are in the same network.
Sometimes on redeployment A and B are connected both with each other and the controller and everything works, but some other times either one of them or both are not.



 Comments   
Comment by Romanos Skiadas [ 17/Oct/16 ]

This happens using Boron 0.5.1-20161017.001225-477

Comment by Romanos Skiadas [ 17/Oct/16 ]

Attachment compute1.txt has been added with description: Compute 1 (working) flows, ovs show

Comment by Romanos Skiadas [ 17/Oct/16 ]

Attachment compute2.txt has been added with description: Compute 2 (not working) ovs show, flows

Comment by Romanos Skiadas [ 17/Oct/16 ]

Attachment controller.txt has been added with description: Controller ovs show, flows, groups

Comment by Romanos Skiadas [ 17/Oct/16 ]

Attachment karaf.log has been added with description: Karaf log

Comment by Romanos Skiadas [ 08/Nov/16 ]

This is either a problem on the opnfv side or a misunderstanding of mine about some things. Either way, I'd like to mark is as INVALID but can't

Comment by Romanos Skiadas [ 21/Nov/16 ]

Disregard the previous comment, I'll add more logs from a deployment with SR1, one OpenStack controller and two computes

Comment by Romanos Skiadas [ 21/Nov/16 ]

Attachment karaf_logaa has been added with description: Karaf log with genius set to trace, 2nd set of logs, 1/3

Comment by Romanos Skiadas [ 21/Nov/16 ]

Attachment karaf_logab has been added with description: Karaf log 2/3

Comment by Romanos Skiadas [ 21/Nov/16 ]

Attachment karaf_logac has been added with description: Karaf log 3/3

Comment by Romanos Skiadas [ 21/Nov/16 ]

Attachment ovs_logs has been added with description: ovs-vsctl show for controller and computes

Comment by Romanos Skiadas [ 21/Nov/16 ]

Attachment itm-transport-zones_after.xml has been added with description: restconf/config/itm:transport-zones before modifying environment

Comment by Romanos Skiadas [ 21/Nov/16 ]

Attachment itm-transport-zones_before.xml has been added with description: restconf/config/itm:transport-zones after modifying environment

Comment by Romanos Skiadas [ 21/Nov/16 ]

https://gerrit.opnfv.org/gerrit/#/c/23971/3/test/functest/testcase_6.py
This is what was being run when the DHCP failures occurred. Instances 'sdnvpn-6-5' and 'sdnvpn-6-3' failed to get an IP.

Comment by Romanos Skiadas [ 22/Nov/16 ]

This happens with odl-netvirt-openstack, I am not using the old vpnservice features.

Comment by Alon Kochba [ 23/Nov/16 ]

It seems this bug stems from the usage of the same TEP in multiple Genius' ITM transport zones.
According to Genius this is a missing functionality and will be corrected to support all use cases.

Hema G will update accordingly.

Comment by Tali Ben-Meir [ 24/Nov/16 ]

Attachment autotunnels.zip has been added with description: Tunnels not created

Comment by Vishal Thapar [ 25/Nov/16 ]

Issue is with dpnendpoints. http://10.0.43.238:8181/restconf/config/itm-state:dpn-endpoints

It assumes DPN is member of single TZ so unable to create mesh if a DPN ends up memeber of multiple TZs. transport-zone field in dpn-endpoints should be a list and should form mesh with all tunnels in all those transportzones.

Comment by Alon Kochba [ 27/Nov/16 ]

(In reply to Vishal Thapar from comment #18)
> Issue is with dpnendpoints.
> http://10.0.43.238:8181/restconf/config/itm-state:dpn-endpoints
>
> It assumes DPN is member of single TZ so unable to create mesh if a DPN ends
> up memeber of multiple TZs. transport-zone field in dpn-endpoints should be
> a list and should form mesh with all tunnels in all those transportzones.

What are our thoughts on fixing this by SR2?
Would it be possible to fix in Genius by that timeline?

The alternative seems to be avoiding multiple TEP in TZ usage.
netvirt can be modified to always use the same TZ when add TEPs, which would kill the minimized mesh optimization but should keep automatic tunnels possible without hitting the Genius bug.

Please keep us posted, it is preferable to fix in Genius and maintain current netvirt behavior if possible.

Comment by Vishal Thapar [ 27/Nov/16 ]

(In reply to Alon Kochba from comment #19)
> (In reply to Vishal Thapar from comment #18)
> > Issue is with dpnendpoints.
> > http://10.0.43.238:8181/restconf/config/itm-state:dpn-endpoints
> >
> > It assumes DPN is member of single TZ so unable to create mesh if a DPN ends
> > up memeber of multiple TZs. transport-zone field in dpn-endpoints should be
> > a list and should form mesh with all tunnels in all those transportzones.
>
> What are our thoughts on fixing this by SR2?
> Would it be possible to fix in Genius by that timeline?
>
> The alternative seems to be avoiding multiple TEP in TZ usage.
> netvirt can be modified to always use the same TZ when add TEPs, which would
> kill the minimized mesh optimization but should keep automatic tunnels
> possible without hitting the Genius bug.
>
> Please keep us posted, it is preferable to fix in Genius and maintain
> current netvirt behavior if possible.

Yep, plan is to fix by SR2 timeframe. Hema is working on a fix though it is bit tricky. However, I ended up coming up with a patch when trying to test my OFTunnel changes.

https://git.opendaylight.org/gerrit/48716

It is not done [haven't touched deletion code at all] but add code works and seems to be enough to fix the issue. If you can test this patch and confirm/deny it fixes the create part, then we can at least try and get this in for SR2.

Comment by Alon Kochba [ 27/Nov/16 ]

(In reply to Vishal Thapar from comment #20)
> (In reply to Alon Kochba from comment #19)
> > (In reply to Vishal Thapar from comment #18)
> > > Issue is with dpnendpoints.
> > > http://10.0.43.238:8181/restconf/config/itm-state:dpn-endpoints
> > >
> > > It assumes DPN is member of single TZ so unable to create mesh if a DPN ends
> > > up memeber of multiple TZs. transport-zone field in dpn-endpoints should be
> > > a list and should form mesh with all tunnels in all those transportzones.
> >
> > What are our thoughts on fixing this by SR2?
> > Would it be possible to fix in Genius by that timeline?
> >
> > The alternative seems to be avoiding multiple TEP in TZ usage.
> > netvirt can be modified to always use the same TZ when add TEPs, which would
> > kill the minimized mesh optimization but should keep automatic tunnels
> > possible without hitting the Genius bug.
> >
> > Please keep us posted, it is preferable to fix in Genius and maintain
> > current netvirt behavior if possible.
>
> Yep, plan is to fix by SR2 timeframe. Hema is working on a fix though it is
> bit tricky. However, I ended up coming up with a patch when trying to test
> my OFTunnel changes.
>
> https://git.opendaylight.org/gerrit/48716
>
> It is not done [haven't touched deletion code at all] but add code works and
> seems to be enough to fix the issue. If you can test this patch and
> confirm/deny it fixes the create part, then we can at least try and get this
> in for SR2.

Thanks a lot for this Vishal.
Confirmed that this fixes the issue, added a comment to the gerrit.

Comment by Vishal Thapar [ 30/Nov/16 ]

First patch fixing add use cases:

Master: https://git.opendaylight.org/gerrit/48716
stable/boron: https://git.opendaylight.org/gerrit/#/c/48817/

Comment by Alon Kochba [ 04/Dec/16 ]

Closing, as this bug was resolved.

Deletion of TEPs will be implemented in https://git.opendaylight.org/gerrit/#/c/48952 and usage of it by netvirt should be done in https://bugs.opendaylight.org/show_bug.cgi?id=6560

Generated at Wed Feb 07 19:59:44 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.