[VTN-43] VTN Manager failed to forward packets occasionally. Created: 29/Sep/14  Updated: 03/May/18  Resolved: 02/Oct/14

Status: Resolved
Project: vtn
Component/s: VTN Manager
Affects Version/s: unspecified
Fix Version/s: None

Type: Bug
Reporter: Hideyuki Tai Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: File vtn_test_script.sh.sh    
External issue ID: 2107

 Description   

VTN Manager failed to forward packets occasionally.

Due to this issue, ping tests of CSIT conducted by Integration Group fails occasionally.

https://jenkins.opendaylight.org/integration/job/integration-master-csit-karaf-vtn-only/
https://jenkins.opendaylight.org/integration/job/integration-master-csit-karaf-vtn-all/

The issues has been discussed in the ML.
https://lists.opendaylight.org/pipermail/vtn-dev/2014-September/000625.html



 Comments   
Comment by Hideyuki Tai [ 30/Sep/14 ]

[How to reproduce the bug]

The following procedure does NOT always reproduce the issue.
To reproduce the issue, you have to do the procedure several times.

Step 1. Run Karaf.

Step 2. Install VNT Manager features.

feature:install odl-vtn-manager-all odl-adsal-compatibility-all odl-openflowplugin-all

Step 3. Run Mininet.

sudo mn --controller=remote,ip=HOST --topo tree,2

Step 4. Create a virtual network.

I'm attaching the script for it.

Or you can get the script from the following URL.
https://gist.github.com/anonymous/b0ae9794fc97da5eb4c0

Step 5. Try to send ping packets.

h1 ping h3

The above ping packets should be forwarded.
But, due to the bug, it fails occasionally.

If you don't face the issue, stop the Mininet and Karaf, and restart the above step 1, 3, and 5.
(You don't have to do step 2 and 4.)

Comment by Hideyuki Tai [ 30/Sep/14 ]

Attachment vtn_test_script.sh.sh has been added with description: Script to create a virtual network. (See comment 1)

Comment by Hideyuki Tai [ 30/Sep/14 ]

As Shigeru said in the following mail, it seems that this issue is caused by asynchronous notification of data change event.

https://lists.opendaylight.org/pipermail/vtn-dev/2014-September/000639.html

It seems that the OpenFlow plugin notifies MD-SAL of events of switches in the correct order.
However, MD-SAL processes these notification in different threads.
Because the kernel thread scheduler does not guarantee execution order of these threads, the order of these notification for MD-SAL application is unstable.

Does it mean that this issue is caused by the OpenFlow plugin or MD-SAL?

Comment by Hideyuki Tai [ 30/Sep/14 ]

Hi Ed,

It seems to me that VTN Neutron bundle has a bad impact in your setup.

Therefore, could you install VTN features as follows instead of odl-vtn-manager-all?

feature:install odl-vtn-manager-java-api odl-vtn-manager-northbound
feature:install odl-adsal-compatibility-all odl-openflowplugin-all

It does not install odl-vtn-manager-neutron feature.

Comment by Luis Gomez [ 01/Oct/14 ]

So tonight I tested this bug more in detail and it looks like the root cause = CONTROLLER-904 = Missing node connectors in AD-SAL Switch manager.

BR/Luis

Comment by Luis Gomez [ 01/Oct/14 ]

The details of he test:

VTN gets the node connector s2-eth2 when node registers:

2014-09-30 08:16:28,835 | INFO | notification-113 | VTNManagerImpl | 218 - org.opendaylight.vtn.manager.implementation - 0.2.0.SNAPSHOT | default: addPort: New port: port=OF|2@OF|00:00:00:00:00:00:00:02, prop=PortProperty[name=s2-eth2,cost=1000,enabled]

However adsal switch manager does not show this port:
{
"nodeConnectorProperties": [
{
"properties": {
"name":

{ "value": "s2" }

,
"state":

{ "value": 1 }

,
"config":

{ "value": 1 }

},
"nodeconnector": {
"node":

{ "id": "00:00:00:00:00:00:00:02", "type": "OF" }

,
"id": "0",
"type": "SW"
}
},
{
"properties": {
"name":

{ "value": "s2-eth1" }

,
"state":

{ "value": 1 }

,
"config":

{ "value": 1 }

,
"bandwidth":

{ "value": 10000000000 }

},
"nodeconnector": {
"node":

{ "id": "00:00:00:00:00:00:00:02", "type": "OF" }

,
"id": "1",
"type": "OF"
}
},
{
"properties": {
"name":

{ "value": "s2-eth3" }

,
"state":

{ "value": 1 }

,
"config":

{ "value": 1 }

,
"bandwidth":

{ "value": 10000000000 }

},
"nodeconnector": {
"node":

{ "id": "00:00:00:00:00:00:00:02", "type": "OF" }

,
"id": "3",
"type": "OF"
}
}
]
}

This makes VInterface if3 to go DOWN:

2014-09-30 08:16:35,241 | INFO | Thread: default | VTNManagerImpl | 218 - org.opendaylight.vtn.manager.implementation - 0.2.0.SNAPSHOT | default:vBridge-IF:Tenant1.vBridge2.if3: Virtual interface added: VInterface[name=if3,enabled,state=UNKNOWN,entityState=UNKNOWN]

2014-09-30 08:16:35,294 | INFO | Thread: default | VTNManagerImpl | 218 - org.opendaylight.vtn.manager.implementation - 0.2.0.SNAPSHOT | default:vBridge-IF:Tenant1.vBridge2.if3: Port mapping added: PortMap[config=PortMapConfig[node=OF|00:00:00:00:00:00:00:02,port=SwitchPort[name=s2-eth2],vlan=0]]

2014-09-30 08:16:35,294 | INFO | Thread: default | VTNManagerImpl | 218 - org.opendaylight.vtn.manager.implementation - 0.2.0.SNAPSHOT | default:vBridge-IF:Tenant1.vBridge2.if3: Virtual interface changed: VInterface[name=if3,enabled,state=DOWN,entityState=UNKNOWN]

And finally ping fails in the test:

Ping h2 to h4 :: Ping h2 to h4, verify no packet loss | FAIL |
'PING 10.0.0.4 (10.0.0.4) 56(84) bytes of data.
From 10.0.0.2 icmp_seq=1 Destination Host Unreachable
From 10.0.0.2 icmp_seq=2 Destination Host Unreachable
From 10.0.0.2 icmp_seq=3 Destination Host Unreachable

— 10.0.0.4 ping statistics —
3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2015ms

BR/Luis

Comment by Ed Warnicke [ 01/Oct/14 ]

Fix here:

https://git.opendaylight.org/gerrit/#/c/11705

So far it has succeeded 8 out of 8 tries for me

Comment by Hideyuki Tai [ 01/Oct/14 ]

(In reply to Ed Warnicke from comment #7)
> Fix here:
>
> https://git.opendaylight.org/gerrit/#/c/11705
>
>
> So far it has succeeded 8 out of 8 tries for me

Hi Ed,

Thank you for submitting the patch.

I've tested the patch, and it has succeeded 10 out 10 tries in my setup.
I think this patch fixes the bug.

Comment by Colin Dixon [ 02/Oct/14 ]

Per Hideyuki's comments, this is fixed for him.

Generated at Wed Feb 07 20:47:51 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.