[NETVIRT-774] occasionally ping between the Vm instances failed with ocata. Created: 07/Jul/17  Updated: 30/Oct/17  Resolved: 13/Sep/17

Status: Resolved
Project: netvirt
Component/s: General
Affects Version/s: Carbon
Fix Version/s: None

Type: Bug
Reporter: Hari Prasidh Assignee: Hari Prasidh
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Zip Archive karaf_log.zip    
External issue ID: 8823

 Description   

I have tested with latest carbon and observed there are some ping failures between the vm instances.

Procedure:
create 2 networks and create VM instances with count 4 on each network.
Observed pinging failed between the VMs within the network.

Please find the attachments for karaf log and sandbox log.



 Comments   
Comment by Hari Prasidh [ 07/Jul/17 ]

Attachment karaf_log.zip has been added with description: karaf_logs

Comment by Hari Prasidh [ 24/Jul/17 ]

Reproduced bug insandbox for the above scenario and observed there are no tunnel ports between the compute nodes.

After analysing logs, there is some delay to getting update DPN tunnel End Point Information(DPNTEPsInfo) into DataStoreCache. Due to this delay, tunnel interface is not created b/w the compute nodes.

I suspect above delay due to update job is in queue is not processed within time.

Comment by Hari Prasidh [ 26/Jul/17 ]

1.Vm creation on compute1.(first VM in compute node 1)
2.Received Vm tap port notification in netvirt and genius module.
3.Tunnel port gets created bi-dirctionally between compute1 and control(For creation of tunnel interfaces it requires DPNTEPsInfo of compute1 and control node
which need to be fetched from Meshed Cache).
NOTE::
Prior to that, while creating network, control node's DPNTEPsInfo will be updated in cache .
Compute1 DPN ID is available in the VM creation request itself

4.Once tunnel intf created, thereafter compute1 DPNTEPsInfo will be updated to cache(Meshed Cache).
5.VM creation on compute2 (first VM in compute node 2)
6.Received Vm tap port notification in netvirt and genius module.
7.Tunnel port gets created bi-dirctionally between compute2 to control and compute2 to compute1
To create tunnel between compute2 to compute1, we need DPNTEPsInfo of compute1 which should be updated in step 4.
NOTE::
Compute-2 DPN ID is available in the VM creation request itself

Step (1) and step (5) can happen in parallel.
If step (7) executed before step (4), we will not get DPNTEPsInfo of compute1.
So tunnel interface creation failed between the compute1 and compute2 nodes.

Note: Tunnel interface creation is one time operation for each node.
i.e. Tunnel interface creation is allowed only for the very first port interface on the node.

below code snippet to create tunnel interface and update DPNTEPsInfo:

org.opendaylight.genius.itm.listeners.TransportZoneListener.java
-------------------------------------------
if (!newDpnTepsList.isEmpty())

{ LOG.trace("Adding TEPs "); ItmTepAddWorker addWorker = new ItmTepAddWorker(newDpnTepsList, Collections.emptyList(), dataBroker, idManagerService, mdsalManager, itmConfig); coordinator.enqueueJob(tzNew.getZoneName(), addWorker); }

Suspect addWorker job in queue is processed with delay.

Comment by Sridhar Gaddam [ 16/Aug/17 ]

@Hari, do you have any update/patch for this issue?

Comment by Hari Prasidh [ 17/Aug/17 ]

(In reply to Sridhar Gaddam from comment #4)
> @Hari, do you have any update/patch for this issue?

yes, please find the below patch
https://git.opendaylight.org/gerrit/#/c/61168/

Comment by A H [ 23/Aug/17 ]

A patch was submitted to revert the changes and fix this bug in Carbon SR2:

https://git.opendaylight.org/gerrit/#/c/62220/

To better assess the impact of this bug and fix, could someone from your team please help us identify the following:
Regression: Is this bug a regression of functionality/performance/feature compared to Carbon?
Severity: Could you elaborate on the severity of this bug? Is this a BLOCKER such that we cannot release Carbon SR2 without it?
Workaround: Is there a workaround such that we can write a release note instead?
Testing: Could you also elaborate on the testing of this patch? How extensively has this patch been tested? Is it covered by any unit tests or system tests?
Impact: Does this fix impact any dependent projects?

Comment by Hari Prasidh [ 28/Aug/17 ]

currently discussing about the approach to fix this.

Comment by Hari Prasidh [ 13/Sep/17 ]

After discussion with community , issue has been fixed with below patch which got merged recently.
https://git.opendaylight.org/gerrit/#/c/59035/

So that I abandoned the below patch.
https://git.opendaylight.org/gerrit/#/c/61168/

And I closed the bug.

Generated at Wed Feb 07 20:22:26 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.