[NETVIRT-130] Netvirt fails to add all the ODL nodes as controllers to br-int Created: 09/Sep/16  Updated: 09/Mar/18  Resolved: 09/Mar/18

Status: Resolved
Project: netvirt
Component/s: None
Affects Version/s: Boron
Fix Version/s: None

Type: Bug Priority: Medium
Reporter: Venkatrangan Govindarajan Assignee: Unassigned
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: File inventory_contents.json     File odl1_log.tgz     File odl2_log.tgz     File odl3_log.tgz     File topology.json     File topology_config.json    
External issue ID: 6685

 Description   

Image Tested: https://nexus.opendaylight.org/content/repositories/autorelease-1484/org/opendaylight/integration/distribution-karaf/0.5.0-Boron/distribution-karaf-0.5.0-Boron.zip

Step 1: Setup 3node ODL

Step 2: Stack three openstack nodes

Step 3: In one of the nodes, the sudo 0vs-vsctl show displayed br-int with only one ODL added as controller.

this observation is a bug because if that ODL goes down the instances in that compute node will not be able to work well

Please note: Actually a failover was done in ODL1 for some testing. so the entity owners were collected after failover so they do not reflect the situation before failover i.e. when the problem occurred. But the karaf logs should be helpful.

When ODL Managers are set in local.conf ODL_OVS_MANAGERS=10.128.0.9,10.128.0.5,10.128.0.6

the expectation is any br-int created by NEtvirt will set all these ODL nodes as openflow controllers.

Among the 3 Openstack nodes, the br-int in one of the nodes had only 10.128.0.9 as openflow controller.



 Comments   
Comment by Venkatrangan Govindarajan [ 09/Sep/16 ]

Attachment odl1_log.tgz has been added with description: ODL1 Logs

Comment by Venkatrangan Govindarajan [ 09/Sep/16 ]

Attachment odl2_log.tgz has been added with description: ODL2 logs

Comment by Venkatrangan Govindarajan [ 09/Sep/16 ]

Attachment inventory_contents.json has been added with description: Inventory DS contents

Comment by Venkatrangan Govindarajan [ 09/Sep/16 ]

Attachment topology_config.json has been added with description: Topology Config creqted by Netvirt

Comment by Venkatrangan Govindarajan [ 09/Sep/16 ]

Attachment topology.json has been added with description: Operational topology from device

Comment by Venkatrangan Govindarajan [ 09/Sep/16 ]

Attachment odl3_log.tgz has been added with description: ODL3 logs

Comment by Venkatrangan Govindarajan [ 09/Sep/16 ]

The ovs-vsctl show output from the three Openstack nodes

Control Node:

[gvrangan@openstack-control-node devstack]$ sudo ovs-vsctl show
750bdcf3-e21e-46b3-9ff0-7af076be63ec
Manager "tcp:10.128.0.6:6640"
is_connected: true
Manager "tcp:10.128.0.9:6640"
Manager "tcp:10.128.0.5:6640"
is_connected: true
Bridge br-int
Controller "tcp:10.128.0.9:6653"
Controller "tcp:10.128.0.6:6653"
is_connected: true
Controller "tcp:10.128.0.5:6653"
is_connected: true
fail_mode: secure
Port br-int
Interface br-int
type: internal
Port "tap7565b826-9a"
Interface "tap7565b826-9a"
type: internal
Port "vxlan-10.128.0.4"
Interface "vxlan-10.128.0.4"
type: vxlan
options:

{key=flow, local_ip="10.128.0.3", remote_ip="10.128.0.4"}

Port "vxlan-10.128.0.7"
Interface "vxlan-10.128.0.7"
type: vxlan
options:

{key=flow, local_ip="10.128.0.3", remote_ip="10.128.0.7"}

ovs_version: "2.5.0"

[gvrangan@compute2 devstack]$ sudo ovs-vsctl list open_vswitch
_uuid : 60e0c199-12af-4b7a-92b1-d158d4760c54
bridges : [0f13142f-bb2d-4c74-8257-65e5f1dd98d6]
cur_cfg : 173
datapath_types : [netdev, system]
db_version : "7.12.1"
external_ids :

{system-id="d4daf026-fe1f-40f2-a560-4e259f76e368"}

iface_types : [geneve, gre, internal, ipsec_gre, lisp, patch, stt, system, tap, vxlan]
manager_options : [a81279da-4528-4839-b130-178efc8ead23, d4ac1d84-e920-4017-91a7-af016701a600, f617d461-52b9-40bc-b477-392aacbecd1b]
next_cfg : 173
other_config :

{local_ip="10.128.0.7"}

ovs_version : "2.5.0"
ssl : []
statistics : {}
system_type : CentOS
system_version : "7.2.1511-Core"

Comment by Venkatrangan Govindarajan [ 09/Sep/16 ]

Node in which the problem was seen

[gvrangan@compute1 devstack]$ sudo ovs-vsctl show
01ef54eb-758e-437c-90f0-c7ff69557873
Manager "tcp:10.128.0.6:6640"
is_connected: true
Manager "tcp:10.128.0.9:6640"
Manager "tcp:10.128.0.5:6640"
is_connected: true
Bridge br-int
Controller "tcp:10.128.0.9:6653"
fail_mode: secure
Port "vxlan-10.128.0.7"
Interface "vxlan-10.128.0.7"
type: vxlan
options:

{key=flow, local_ip="10.128.0.4", remote_ip="10.128.0.7"}

Port br-int
Interface br-int
type: internal
Port "vxlan-10.128.0.3"
Interface "vxlan-10.128.0.3"
type: vxlan
options:

{key=flow, local_ip="10.128.0.4", remote_ip="10.128.0.3"}

Port "tap0682360f-2c"
Interface "tap0682360f-2c"
Port "tap5169fc21-5e"
Interface "tap5169fc21-5e"
Port "tap39c5eefd-f3"
Interface "tap39c5eefd-f3"
ovs_version: "2.5.0"
[gvrangan@compute1 devstack]$
[gvrangan@compute1 devstack]$
[gvrangan@compute1 devstack]$
[gvrangan@compute1 devstack]$ sudo ovs-vsctl list open_vswitch
_uuid : 01ef54eb-758e-437c-90f0-c7ff69557873
bridges : [1874541a-631c-4740-aa1c-768c021f6f2f]
cur_cfg : 195
datapath_types : [netdev, system]
db_version : "7.12.1"
external_ids :

{system-id="3087f2d9-7a70-4490-8ca7-592d993bc987"}

iface_types : [geneve, gre, internal, ipsec_gre, lisp, patch, stt, system, tap, vxlan]
manager_options : [5b1f2aa6-1181-4a41-8564-c3154811ebb5, 764b661e-4e9d-498a-92e1-35ba954eeabb, c7736537-ebf0-433d-9cc7-85208c2ceaa4]
next_cfg : 195
other_config :

{local_ip="10.128.0.4"}

ovs_version : "2.5.0"
ssl : []
statistics : {}
system_type : CentOS
system_version : "7.2.1511-Core"

Comment by Venkatrangan Govindarajan [ 09/Sep/16 ]

Final Node

[gvrangan@compute2 devstack]$ sudo ovs-vsctl list open_vswitch
_uuid : 60e0c199-12af-4b7a-92b1-d158d4760c54
bridges : [0f13142f-bb2d-4c74-8257-65e5f1dd98d6]
cur_cfg : 173
datapath_types : [netdev, system]
db_version : "7.12.1"
external_ids :

{system-id="d4daf026-fe1f-40f2-a560-4e259f76e368"}

iface_types : [geneve, gre, internal, ipsec_gre, lisp, patch, stt, system, tap, vxlan]
manager_options : [a81279da-4528-4839-b130-178efc8ead23, d4ac1d84-e920-4017-91a7-af016701a600, f617d461-52b9-40bc-b477-392aacbecd1b]
next_cfg : 173
other_config :

{local_ip="10.128.0.7"}

ovs_version : "2.5.0"
ssl : []
statistics : {}
system_type : CentOS
system_version : "7.2.1511-Core"

[gvrangan@compute2 devstack]$
[gvrangan@compute2 devstack]$
[gvrangan@compute2 devstack]$
[gvrangan@compute2 devstack]$ sudo ovs-vsctl show
60e0c199-12af-4b7a-92b1-d158d4760c54
Manager "tcp:10.128.0.5:6640"
is_connected: true
Manager "tcp:10.128.0.9:6640"
Manager "tcp:10.128.0.6:6640"
is_connected: true
Bridge br-int
Controller "tcp:10.128.0.9:6653"
Controller "tcp:10.128.0.5:6653"
is_connected: true
Controller "tcp:10.128.0.6:6653"
is_connected: true
fail_mode: secure
Port "vxlan-10.128.0.3"
Interface "vxlan-10.128.0.3"
type: vxlan
options:

{key=flow, local_ip="10.128.0.7", remote_ip="10.128.0.3"}

Port "vxlan-10.128.0.4"
Interface "vxlan-10.128.0.4"
type: vxlan
options:

{key=flow, local_ip="10.128.0.7", remote_ip="10.128.0.4"}

Port br-int
Interface br-int
type: internal
Port "tap5d6805d0-95"
Interface "tap5d6805d0-95"
Port "tap7749d598-b4"
Interface "tap7749d598-b4"
ovs_version: "2.5.0"

Comment by Venkatrangan Govindarajan [ 09/Sep/16 ]

(In reply to Venkatrangan Govindarajan from comment #0)
> Image Tested:
> https://nexus.opendaylight.org/content/repositories/autorelease-1484/org/
> opendaylight/integration/distribution-karaf/0.5.0-Boron/distribution-karaf-0.
> 5.0-Boron.zip
>
>
> Step 1: Setup 3node ODL
>
> Step 2: Stack three openstack nodes
>
> Step 3: In one of the nodes, the sudo 0vs-vsctl show displayed br-int with
> only one ODL added as controller.
>
>
> this observation is a bug because if that ODL goes down the instances in
> that compute node will not be able to work well
>
> Please note: Actually a failover was done in ODL1 for some testing. so the
> entity owners were collected after failover so they do not reflect the
> situation before failover i.e. when the problem occurred. But the karaf logs
> should be helpful.
>
>
> When ODL Managers are set in local.conf
> ODL_OVS_MANAGERS=10.128.0.9,10.128.0.5,10.128.0.6
>
> the expectation is any br-int created by NEtvirt will set all these ODL
> nodes as openflow controllers.
>
>
> Among the 3 Openstack nodes, the br-int in one of the nodes had only
> 10.128.0.9 as openflow controller.

Also Note I had created some instances for testing

Comment by Venkatrangan Govindarajan [ 09/Sep/16 ]

The topology information and inventory information weere colelcted after failover only

Comment by ranjithkumar_t [ 19/Sep/16 ]

I have analysed the karaf logs which you have attached in the bug. The following are my prediction.

• The node 10.128.0.9 is not a member of the cluster nodes.
• The time interval for the node to get connected to the cluster is more than 2 minutes thus it reaches the unreachable state.
• May be we predict the node has gone down and the karaf process fails.

Logs obseerved in karaf logs:
=============================

2016-09-09 18:46:00,472 | WARN | ult-dispatcher-5 | ReliableDeliverySupervisor | 154 - com.typesafe.akka.slf4j - 2.4.7 | Association with remote system [akka.tcp://opendaylight-cluster-data@10.128.0.9:2550] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://opendaylight-cluster-data@10.128.0.9:2550]] Caused by: [Connection refused: /10.128.0.9:2550]

2016-09-09 19:52:55,245 | INFO | ult-dispatcher-7 | kka://opendaylight-cluster-data) | 154 - com.typesafe.akka.slf4j - 2.4.7 | Cluster Node [akka.tcp://opendaylight-cluster-data@10.128.0.5:2550] - Leader can currently not perform its duties, reachability status: [akka.tcp://opendaylight-cluster-data@10.128.0.5:2550 -> akka.tcp://opendaylight-cluster-data@10.128.0.9:2550: Unreachable [Unreachable] (1), akka.tcp://opendaylight-cluster-data@10.128.0.6:2550 -> akka.tcp://opendaylight-cluster-data@10.128.0.9:2550: Unreachable [Unreachable] (1)], member status: [akka.tcp://opendaylight-cluster-data@10.128.0.5:2550 Up seen=true, akka.tcp://opendaylight-cluster-data@10.128.0.6:2550 Up seen=true, akka.tcp://opendaylight-cluster-data@10.128.0.9:2550 Up seen=false]

[_value=tcp:10.128.0.6:6653], isIsConnected=true, augmentations={}}, ControllerEntry{getControllerUuid=Uuid [_value=b56f65ee-999a-48f1-ae43-1e529c5cb48a], getTarget=Uri [_value=tcp:10.128.0.5:6653], isIsConnected=true, augmentations={}}, ControllerEntry{getControllerUuid=Uuid [_value=82d358f0-629d-44da-8cd0-923002bd68f5], getTarget=Uri [_value=tcp:10.128.0.9:6653], isIsConnected=false, augmentations={}}], getDatapathId=DatapathId [_value=00:00:b4:9c:52:d0:cb:95]

I have tested many times manually but the bug was not reproduced.

Comment by Sam Hague [ 09/Mar/18 ]

Legacy NetVirt is deprecated.

Generated at Wed Feb 07 20:20:46 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.