[OVSDB-444] Port binding failure after rebooting compute node - hostconfig missing Created: 13/Dec/17  Updated: 02/May/19  Resolved: 02/May/19

Status: Resolved
Project: ovsdb
Component/s: Southbound.Open_vSwitch
Affects Version/s: Oxygen-SR4, Fluorine
Fix Version/s: Fluorine

Type: Bug Priority: Medium
Reporter: Victor Pickard Assignee: Victor Pickard
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: HTML File topologyAfterRebootCompute     HTML File topologyBeforeRebootCompute    
Issue Links:
Duplicate
duplicates OVSDB-455 l2 agent marked down when no response... Open
Relates
relates to NETVIRT-1178 ODL L2 Agent is dead after restarting... Resolved

 Description   

I have a pike/carbon TripleO based setup.

Spin up a VM on compute node, verify it gets an IP and can ping. All is good.

Now, delete the VM, reboot the compute node (sudo shutdown -r now).

Compute node comes up, tunnel is up between compute and control (good stuff). But, when I boot the VM, port binding fails, as shown below.

It looks like networking_odl sees the agent as "dead". Tried booting the VM again, same issue (was wondering if it took some time to see the agent as "alive").

This log is generated:

 

 

2017-11-29 23:52:38.233 56464 WARNING networking_odl.ml2.pseudo_agentdb_binding [req-fde69db7-2e68-4d1b-887c-3d16680cf567 74655be1d3634adc8234db3b9a35d083 231c8ce05e234185ac3db9c98e5eb3bd - default default] Refusing to bind port 13029ea7-
 9821-4796-a1d3-0640a8b14873 to dead pseudo agent: {'binary': u'neutron-odlagent-portbinding', 'description': None, 'admin_state_up': True, 'heartbeat_timestamp': datetime.datetime(2017, 11, 29, 23, 46, 18), 'availability_zone': None, 'ali
 ve': False, 'topic': u'N/A', 'host': u'overcloud-novacompute-0.opnfvlf.org', 'agent_type': u'ODL L2', 'resource_versions': {}, 'created_at': datetime.datetime(2017, 11, 28, 2, 40, 59), 'started_at': datetime.datetime(2017, 11, 28, 2, 40, 
 59), 'id': u'dddc4a58-fee2-4788-b560-433be36f64fd', 'configurations': {u'allowed_network_types': [u'local', u'vlan', u'vxlan', u'gre'], u'bridge_mappings':
{u'datacentre': u'br-ex'}
, u'supported_vnic_types': [{u'vnic_type': u'normal', u'v
 if_type': u'ovs', u'vif_details': {}}]}} 
 2017-11-29 23:52:38.233 56464 WARNING networking_odl.ml2.pseudo_agentdb_binding [req-fde69db7-2e68-4d1b-887c-3d16680cf567 74655be1d3634adc8234db3b9a35d083 231c8ce05e234185ac3db9c98e5eb3bd - default default] No ODL hostconfigs for host ove
 rcloud-novacompute-0.opnfvlf.org found in agentdb

 

 

More debugging with input from devs, indicates that hostconfig is missing from ODL.

 

Here is that query after reboot of compute node.

[

root@overcloud-controller-0 neutron]# curl -s -u admin:admin -X GET http://${CIP}:8181/restconf/operational/neutron:neutron/hostconfigs/ | python -m json.tool 
 { 
 "hostconfigs": { 
 "hostconfig": [ 
 { 
 "config": "{ \"supported_vnic_types\": [{ \"vnic_type\": \"normal\", \"vif_type\": \"ovs\", \"vif_details\": {} }], \"allowed_network_types\": [\"local\",\"vlan\",\"vxlan\",\"gre\"], \"bridge_mappings\": {\"da
 tacentre\":\"br-ex\"}}", 
 "host-id": "overcloud-controller-0.opnfvlf.org", 
 "host-type": "ODL L2" 
 } 
 ] 
 } 
 } 
 [root@overcloud-controller-0 neutron]#

 

One last data point. I went to the compute node, did a del-manager, then set-manager, and the compute node is now in the host config, and I can boot VMs on the compute node again.


(overcloud) [heat-admin@overcloud-controller-0 log]$ curl -s -u admin:admin -X GET http://${CIP}:8181/restconf/operational/neutron:neutron/hostconfigs/ | python -m json.tool
{
"hostconfigs": {
"hostconfig": [
{
"config": "{ \"supported_vnic_types\": [{ \"vnic_type\": \"normal\", \"vif_type\": \"ovs\", \"vif_details\": {} }], \"allowed_network_types\": [\"local\",\"vlan\",\"vxlan\",\"gre\"], \"bridge_mappings\": {\"da
tacentre\":\"br-ex\"}}",
"host-id": "overcloud-controller-0.opnfvlf.org",
"host-type": "ODL L2"
},
{
"config": "{ \"supported_vnic_types\": [{ \"vnic_type\": \"normal\", \"vif_type\": \"ovs\", \"vif_details\": {} }], \"allowed_network_types\": [\"local\",\"vlan\",\"vxlan\",\"gre\"], \"bridge_mappings\": {\"da
tacentre\":\"br-ex\"}}",
"host-id": "overcloud-novacompute-0.opnfvlf.org",
"host-type": "ODL L2"
}
]
}
}
(overcloud) [heat-admin@overcloud-controller-0 log]${noformat}
 

From Moshe, another workaound is to restart opendaylight service.

 

Isaku's email describing some relevant behavior:

Right now Ran Xiao has reported a bug related to hostconfig.
We're looking into it. Due to the issue, hostconfig entry may not be
created properly. It can be diagnosed by looking the log of ODL.
https://lists.opendaylight.org/pipermail/neutron-dev/2017-November/001472.html
https://jira.opendaylight.org/browse/NEUTRON-151

When compute node is shutdown, hostconfig entry in MD-SAL is deleted as ovsdb
connection is deleted. Then entry in neutron agentdb is deleted, thus
port binding on the shutdown compute node will fail. This is
intentional behavior as Daya pointed it out.

Then, suppose the compute node is back. When port binding occurs and agentdb
entry is not found or the entry is not alive, networking-odl tries to
refresh agentdb by retrieving hostconfig of the compute node from ODL
to update agentdb with new one.
This logic is introduced from Pike release. Before Pike, periodic update
was used. 30 sec polling.

If hostconfig in MD-SAL isn't populated after rebooting compude node,
it sounds like a bug somewhere in ODL. Probably ODL Neutron Northbound.
I'm quite happy to look into it.
I would look at MD-SAL hostconfig, and then topology to check if
corresponding ovsdb entry exists.



 Comments   
Comment by Ritu Sood [ 14/Dec/17 ]

 After reboot of compute node can you also check the ovsdb tables and check if the tables are populated with the hostconfig data. ( ovs-vsctl list Open_vSwitch)

Comment by Victor Pickard [ 14/Dec/17 ]

 

Yes, the tables are populated with host config after reboot. Here is the dump before reboot:

[stack@control2 vagrant]$ sudo ovs-vsctl list Open_vSwitch
_uuid               : 02ca8a9e-06b0-4147-8cb4-38e77d882242
bridges             : [53143932-72fd-451f-9465-a1a06ba8a0d0, 9a905ed7-99f1-405e-a6e6-bff5ba3bdacb]
cur_cfg             : 10
datapath_types      : [netdev, system]
db_version          : "7.14.0"
external_ids        : {hostname="control2", "odl_os_hostconfig_config_odl_l2"="{\"allowed_network_types\": [\"local\", \"flat\", \"vlan\", \"vxlan\", \"gre\"], \"bridge_m
appings\": {}, \"datapath_type\": \"system\", \"supported_vnic_types\": [{\"vif_type\": \"ovs\", \"vnic_type\": \"normal\", \"vif_details\": {\"support_vhost_user\": fals
e, \"has_datapath_type_netdev\": false, \"uuid\": \"02ca8a9e-06b0-4147-8cb4-38e77d882242\", \"host_addresses\": [\"control2\"]}}]}", odl_os_hostconfig_hostid="control2",
system-id="0697fb3a-40e3-4d15-b055-9af0f2e34f1c"}
iface_types         : [geneve, gre, internal, ipsec_gre, lisp, patch, stt, system, tap, vxlan]
manager_options     : [3cb6b115-bb53-4cb6-88ad-39691c54822a]
next_cfg            : 10
other_config        : {local_ip="192.168.254.251"}
ovs_version         : "2.6.1"
ssl                 : []
statistics          : {}
system_type         : centos
system_version      : "7"
[stack@control2 vagrant]$ sudo shutdown -r now

 

Dump after reboot

 

[stack@control2 vagrant]$ sudo ovs-vsctl list Open_vSwitch
_uuid               : 02ca8a9e-06b0-4147-8cb4-38e77d882242
bridges             : [53143932-72fd-451f-9465-a1a06ba8a0d0, 9a905ed7-99f1-405e-a6e6-bff5ba3bdacb]
cur_cfg             : 10
datapath_types      : [netdev, system]
db_version          : "7.14.0"
external_ids        : {hostname="control2", "odl_os_hostconfig_config_odl_l2"="{\"allowed_network_types\": [\"local\", \"flat\", \"vlan\", \"vxlan\", \"gre\"], \"bridge_m
appings\": {}, \"datapath_type\": \"system\", \"supported_vnic_types\": [{\"vif_type\": \"ovs\", \"vnic_type\": \"normal\", \"vif_details\": {\"support_vhost_user\": fals
e, \"has_datapath_type_netdev\": false, \"uuid\": \"02ca8a9e-06b0-4147-8cb4-38e77d882242\", \"host_addresses\": [\"control2\"]}}]}", odl_os_hostconfig_hostid="control2",
system-id="0697fb3a-40e3-4d15-b055-9af0f2e34f1c"}
iface_types         : [geneve, gre, internal, ipsec_gre, lisp, patch, stt, system, tap, vxlan]
manager_options     : [3cb6b115-bb53-4cb6-88ad-39691c54822a]
next_cfg            : 10
other_config        : {local_ip="192.168.254.251"}
ovs_version         : "2.6.1"
ssl                 : []
statistics          : {}
system_type         : centos
system_version      : "7"
[stack@control2 vagrant]$

 

 

 

Comment by Ritu Sood [ 16/Dec/17 ]

That's good.

One other thing to verify is whether after reboot the ovsdb manager is set before the ovsdb tables are updated with the hostconfig information. If the sequence is reversed that can cause ODL to miss hostconfig update. 

Comment by Isaku Yamahata [ 20/Dec/17 ]

ovsdb table looks good. What about ovsdb entry in topology in MD-SAL?

ODL ovsdb handles reboots properly?

 

Comment by Victor Pickard [ 20/Dec/17 ]

The OVSDB node is not in the topology in MD-SAL after reboot.

 

Here is the compute node info. I have attached 2 files, one with topology before reboot, one with after reboot. You will see that this node is not there after node reboot.

topologyBeforeRebootCompute

 

topologyAfterRebootCompute

 

[heat-admin@overcloud-novacompute-0 ~]$ sudo ovs-vsctl show
2e084435-25bc-4037-b1c5-30b90f6f983d
   Manager "tcp:192.0.2.14:6640"
       is_connected: true
   Bridge br-ex
       fail_mode: standalone
       Port br-ex
           Interface br-ex
               type: internal
       Port "eth2"
           Interface "eth2"
   Bridge br-int
       Controller "tcp:192.0.2.14:6653"
           is_connected: true
       fail_mode: secure
       Port br-int
           Interface br-int
               type: internal
       Port br-ex-patch
           Interface br-ex-patch
               type: patch
               options: {peer=br-ex-int-patch}
       Port "tuncd01fa2b4fa"
           Interface "tuncd01fa2b4fa"
               type: vxlan
               options: {key=flow, local_ip="11.0.0.26", remote_ip="11.0.0.25"}
   ovs_version: "2.7.3"
[heat-admin@overcloud-novacompute-0 ~]$

Comment by Isaku Yamahata [ 20/Dec/17 ]

I'm asking the value in MD-SAL ovsdb model. not ovsdb table in ovs.

 

Comment by Victor Pickard [ 03/Jan/18 ]

Isaku,

Look at my comment earlier, from 20/Dec/17.  There are 2 links, in that comment, which are the MD-SAL dumps. You can see that the OVSDB node was not in the topology after reboot. So, this appears to be an OVSDB bug, I'll change the bug. 

Comment by Victor Pickard [ 22/Aug/18 ]

 

I'm still seeing this with queens/oxygen.

 

Some karaf logs that may be of interest.... Will keep digging.

{no format}

2018-08-22T19:39:23,469 | ERROR | OVSDBConnNotifSer-9 | OvsdbConnectionManager | 393 - org.opendaylight.ovsdb.southbound-impl - 1.6.4.SNAPSHOT | Old connection still hanging for Entity{type='ovsdb', id=/(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)network-topology/topology/topology[{(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)topology-id=ovsdb:1}]/node/node[{(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)node-id=ovsdb://uuid/922f7894-5e8f-4b34-8284-8029f3934312}]}
2018-08-22T19:39:23,469 | INFO | OVSDBConnNotifSer-9 | OvsdbConnectionManager | 393 - org.opendaylight.ovsdb.southbound-impl - 1.6.4.SNAPSHOT | Library disconnected PASSIVE from /10.8.125.231:49846 to /10.8.125.230:6640. Cleaning up the operational data store
2018-08-22T19:39:23,469 | INFO | OVSDBConnNotifSer-9 | OvsdbConnectionManager | 393 - org.opendaylight.ovsdb.southbound-impl - 1.6.4.SNAPSHOT | Library disconnected ConnectionInfo [_localIp=IpAddress [_ipv4Address=Ipv4Address [_value=10.8.125.230]], _localPort=PortNumber [_value=6640], _remoteIp=IpAddress [_ipv4Address=Ipv4Address [_value=10.8.125.231]], _remotePort=PortNumber [_value=49846], augmentation=[]] this controller does not have ownership


2018-08-22T19:42:09,781 | ERROR | nioEventLoopGroup-5-5 | ExceptionHandler | 389 - org.opendaylight.ovsdb.library - 1.6.4.SNAPSHOT | Exception occurred while processing connection pipeline
io.netty.handler.timeout.ReadTimeoutException: null
2018-08-22T19:42:09,784 | INFO | nioEventLoopGroup-5-5 | ExceptionHandler | 389 - org.opendaylight.ovsdb.library - 1.6.4.SNAPSHOT | Closing channel to ovsdb [id: 0x51eb2fe8, L:/10.8.125.230:6640 - R:/10.8.125.231:53276]
2018-08-22T19:42:09,784 | INFO | nioEventLoopGroup-5-5 | OvsdbConnectionService | 389 - org.opendaylight.ovsdb.library - 1.6.4.SNAPSHOT | Connection closed ConnectionInfo [Remote-address=10.8.125.231, Remote-port=53276, Local-address10.8.125.230, Local-port=6640, type=PASSIVE]
2018-08-22T19:42:09,785 | INFO | nioEventLoopGroup-5-5 | HwvtepConnectionManager | 388 - org.opendaylight.ovsdb.hwvtepsouthbound-impl - 1.6.4.SNAPSHOT | Library disconnected PASSIVE from /10.8.125.231:53276 to /10.8.125.230:6640. Cleaning up the operational data store
2018-08-22T19:42:09,785 | WARN | nioEventLoopGroup-5-5 | HwvtepConnectionManager | 388 - org.opendaylight.ovsdb.hwvtepsouthbound-impl - 1.6.4.SNAPSHOT | HWVTEP disconnected event did not find connection instance for ConnectionInfo [_localIp=IpAddress [_ipv4Address=Ipv4Address [_value=10.8.125.230]], _localPort=PortNumber [_value=6640], _remoteIp=IpAddress [_ipv4Address=Ipv4Address [_value=10.8.125.231]], _remotePort=PortNumber [_value=53276], augmentation=[]]
2018-08-22T19:42:09,786 | INFO | nioEventLoopGroup-5-5 | OvsdbConnectionManager | 393 - org.opendaylight.ovsdb.southbound-impl - 1.6.4.SNAPSHOT | Library disconnected PASSIVE from /10.8.125.231:53276 to /10.8.125.230:6640. Cleaning up the operational data store
2018-08-22T19:42:09,786 | INFO | nioEventLoopGroup-5-5 | OvsdbConnectionManager | 393 - org.opendaylight.ovsdb.southbound-impl - 1.6.4.SNAPSHOT | Library disconnected ConnectionInfo [_localIp=IpAddress [_ipv4Address=Ipv4Address [_value=10.8.125.230]], _localPort=PortNumber [_value=6640], _remoteIp=IpAddress [_ipv4Address=Ipv4Address [_value=10.8.125.231]], _remotePort=PortNumber [_value=53276], augmentation=[]] this controller instance has ownership
2018-08-22T19:42:09,787 | INFO | nioEventLoopGroup-5-5 | StalePassiveConnectionService | 389 - org.opendaylight.ovsdb.library - 1.6.4.SNAPSHOT | Client disconnected ConnectionInfo [Remote-address=10.8.125.231, Remote-port=53276, Local-address10.8.125.230, Local-port=6640, type=PASSIVE]
2018-08-22T19:42:09,789 | WARN | opendaylight-cluster-data-notification-dispatcher-729 | OvsdbNodeListener | 255 - org.opendaylight.genius.itm-impl - 0.4.4.SNAPSHOT | processBridgeUpdate: bridge br-int removal case when Switch is disconnected.Hence, Ovsdb Node could not be fetched from Oper DS.
2018-08-22T19:42:09,790 | WARN | opendaylight-cluster-data-notification-dispatcher-729 | OvsdbNodeListener | 255 - org.opendaylight.genius.itm-impl - 0.4.4.SNAPSHOT | processBridgeUpdate: bridge br-ex removal case when Switch is disconnected.Hence, Ovsdb Node could not be fetched from Oper DS.
2018-08-22T19:42:09,798 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-4 | OvsdbConnectionManager | 393 - org.opendaylight.ovsdb.southbound-impl - 1.6.4.SNAPSHOT | Entity{type='ovsdb', id=/(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)network-topology/topology/topology[{(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)topology-id=ovsdb:1}]/node/node[{(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)node-id=ovsdb://uuid/922f7894-5e8f-4b34-8284-8029f3934312}]} has no owner, cleaning up the operational data store
{no format}
Comment by Victor Pickard [ 22/Aug/18 ]

Here is the hostconfig before rebooting compute node compute1 (sudo shutdown -r now). Notice, that compute1 is missing from the hostconfig in the 2nd query (it is there initially, but takes some time before it is removed, like a minute or so....)

The workaround, is to do a del-manager, followed by a set-manager, on the compute node (until this bug is resolved).

[Vic@rippleRider$]$ curl -s -u admin:admin -X GET http://${CIP}:8181/restconf/operational/neutron:neutron/hostconfigs/ | python -m json.tool
{
"hostconfigs": {
"hostconfig": [
{
"config": "{\"allowed_network_types\": [\"local\", \"flat\", \"vlan\", \"vxlan\", \"gre\"], \"bridge_mappings\": {}, \"datapath_type\": \"system\", \"supported_vnic_types\": [{\"vif_type\": \"ovs\", \"vnic_type\": \"normal\", \"vif_details\": {\"support_vhost_user\": false, \"has_datapath_type_netdev\": false, \"uuid\": \"554a442a-34fa-4ead-845f-7a2f14fea0a7\", \"host_addresses\": [\"control\"]}}]}",
"host-id": "control",
"host-type": "ODL L2"
},
{
"config": "{\"allowed_network_types\": [\"local\", \"flat\", \"vlan\", \"vxlan\", \"gre\"], \"bridge_mappings\": {}, \"datapath_type\": \"system\", \"supported_vnic_types\": [{\"vif_type\": \"ovs\", \"vnic_type\": \"normal\", \"vif_details\": {\"support_vhost_user\": false, \"has_datapath_type_netdev\": false, \"uuid\": \"922f7894-5e8f-4b34-8284-8029f3934312\", \"host_addresses\": [\"compute1\"]}}]}",
"host-id": "compute1",
"host-type": "ODL L2"
},
{
"config": "{\"allowed_network_types\": [\"local\", \"flat\", \"vlan\", \"vxlan\", \"gre\"], \"bridge_mappings\": {}, \"datapath_type\": \"system\", \"supported_vnic_types\": [{\"vif_type\": \"ovs\", \"vnic_type\": \"normal\", \"vif_details\": {\"support_vhost_user\": false, \"has_datapath_type_netdev\": false, \"uuid\": \"4c53d595-7d24-450e-96c7-edbac64ffb7f\", \"host_addresses\": [\"compute2\"]}}]}",
"host-id": "compute2",
"host-type": "ODL L2"
}
]
}
}

[Vic@rippleRider$]$ curl -s -u admin:admin -X GET http://${CIP}:8181/restconf/operational/neutron:neutron/hostconfigs/ | python -m json.tool
{
"hostconfigs": {
"hostconfig": [
{
"config": "{\"allowed_network_types\": [\"local\", \"flat\", \"vlan\", \"vxlan\", \"gre\"], \"bridge_mappings\": {}, \"datapath_type\": \"system\", \"supported_vnic_types\": [{\"vif_type\": \"ovs\", \"vnic_type\": \"normal\", \"vif_details\": {\"support_vhost_user\": false, \"has_datapath_type_netdev\": false, \"uuid\": \"554a442a-34fa-4ead-845f-7a2f14fea0a7\", \"host_addresses\": [\"control\"]}}]}",
"host-id": "control",
"host-type": "ODL L2"
},
{
"config": "{\"allowed_network_types\": [\"local\", \"flat\", \"vlan\", \"vxlan\", \"gre\"], \"bridge_mappings\": {}, \"datapath_type\": \"system\", \"supported_vnic_types\": [{\"vif_type\": \"ovs\", \"vnic_type\": \"normal\", \"vif_details\": {\"support_vhost_user\": false, \"has_datapath_type_netdev\": false, \"uuid\": \"4c53d595-7d24-450e-96c7-edbac64ffb7f\", \"host_addresses\": [\"compute2\"]}}]}",
"host-id": "compute2",
"host-type": "ODL L2"
}
]
}
}

Comment by Victor Pickard [ 23/Aug/18 ]

Here is my analysis of this issue:

1. switch/compute node is rebooted
2. New connection request comes in. OVSDB still has old connection info, hasn't detected that old connection when down
3. Ovsdb initiates ping/echo on old connection, which fails. Ovsdb notifies clients, but there is NO CLEANUP of old connection
4. New connection is closed, as registerEntityOwnership() fails, existing old connection in the ovsdb map
5. Old connection is closed later from ExceptionHandler (readTimeoutException), like 3 secs later
6. OVSDB thinks there is no connection, no hostconfig in oper d/s.
7. But.. the connection is really there....

Proposed Fix
===========
In step 3, the old connection has failed the echo test, but only action was to notify clients. Wouldn't it be good to go ahead and properly close/cleanup the connection here and now, instead of readTimeOutException 3 secs from now? What a lovely idea! Yes, it would. Thanks!

New connection attempt from switch just after reboot
=========================================
2018-08-22T22:57:21,427 | INFO | OVSDBConnNotifSer-28 | StalePassiveConnectionService | 389 - org.opendaylight.ovsdb.library - 1.6.4.SNAPSHOT | Adding client to pending list ConnectionInfo [Remote-address=10.8.125.231, Remote-port=34176, Local-address10.8.125.230, Local-port=6640, type=PASSIVE]
2018-08-22T22:57:21,427 | INFO | OVSDBConnNotifSer-28 | StalePassiveConnectionService | 389 - org.opendaylight.ovsdb.library - 1.6.4.SNAPSHOT | Echo testing client ConnectionInfo [Remote-address=10.8.125.231, Remote-port=45016, Local-address10.8.125.230, Local-port=6640, type=PASSIVE]

Testing of old connection – fails in 1 sec, clients are notified.
=============================================
2018-08-22T22:57:22,427 | INFO | OVSDB-Lib-Future-Reaper-1 | StalePassiveConnectionService | 389 - org.opendaylight.ovsdb.library - 1.6.4.SNAPSHOT | Echo testing of old client ConnectionInfo [Remote-address=10.8.125.231, Remote-port=45016, Local-address10.8.125.230, Local-port=6640, type=PASSIVE] failed

2018-08-22T22:57:22,428 | INFO | OVSDB-Lib-Future-Reaper-1 | StalePassiveConnectionService | 389 - org.opendaylight.ovsdb.library - 1.6.4.SNAPSHOT | Sending notification for client ConnectionInfo [Remote-address=10.8.125.231, Remote-port=34176, Local-address10.8.125.230, Local-port=6640, type=PASSIVE]

Switch connects to controller after reboot, new connection
============================================
2018-08-22T22:57:22,429 | INFO | OVSDBConnNotifSer-29 | OvsdbConnectionManager | 393 - org.opendaylight.ovsdb.southbound-impl - 1.6.4.SNAPSHOT | Library connected PASSIVE from /10.8.125.231:34176 to /10.8.125.230:6640

This will register the entity for ownership, registerEntityForOwnership(), which fails... next logs

Old connection still there, so new connection is rejected, and ovsdb closes the new connection....
=========================================================================
2018-08-22T22:57:22,439 | ERROR | OVSDBConnNotifSer-29 | OvsdbConnectionManager | 393 - org.opendaylight.ovsdb.southbound-impl - 1.6.4.SNAPSHOT | Old connection still hanging for Entity{type='ovsdb', id=/(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)network-topology/topology/topology[

{(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)topology-id=ovsdb:1}

]/node/node[

{(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)node-id=ovsdb://uuid/922f7894-5e8f-4b34-8284-8029f3934312}

]}
2018-08-22T22:57:22,440 | INFO | OVSDBConnNotifSer-29 | OvsdbConnectionManager | 393 - org.opendaylight.ovsdb.southbound-impl - 1.6.4.SNAPSHOT | Library disconnected PASSIVE from /10.8.125.231:34176 to /10.8.125.230:6640. Cleaning up the operational data store

Read timeout exception (from echo/ping failure of old connection) Fails, closes connection, which ends up removing from hostconfig
===============================================================================================

2018-08-22T23:00:09,548 | ERROR | nioEventLoopGroup-5-7 | ExceptionHandler | 389 - org.opendaylight.ovsdb.library - 1.6.4.SNAPSHOT | Exception occurred while processing connection pipeline
io.netty.handler.timeout.ReadTimeoutException: null
2018-08-22T23:00:09,549 | INFO | nioEventLoopGroup-5-7 | ExceptionHandler | 389 - org.opendaylight.ovsdb.library - 1.6.4.SNAPSHOT | Closing channel to ovsdb [id: 0x35d1f7ce, L:/10.8.125.230:6640 - R:/10.8.125.231:45016]
2018-08-22T23:00:09,549 | INFO | nioEventLoopGroup-5-7 | OvsdbConnectionService | 389 - org.opendaylight.ovsdb.library - 1.6.4.SNAPSHOT | Connection closed ConnectionInfo [Remote-address=10.8.125.231, Remote-port=45016, Local-address10.8.125.230, Local-port=6640, type=PASSIVE]
2018-08-22T23:00:09,550 | INFO | nioEventLoopGroup-5-7 | HwvtepConnectionManager | 388 - org.opendaylight.ovsdb.hwvtepsouthbound-impl - 1.6.4.SNAPSHOT | Library disconnected PASSIVE from /10.8.125.231:45016 to /10.8.125.230:6640. Cleaning up the operational data store
2018-08-22T23:00:09,550 | WARN | nioEventLoopGroup-5-7 | HwvtepConnectionManager | 388 - org.opendaylight.ovsdb.hwvtepsouthbound-impl - 1.6.4.SNAPSHOT | HWVTEP disconnected event did not find connection instance for ConnectionInfo [_localIp=IpAddress [_ipv4Address=Ipv4Address [_value=10.8.125.230]], _localPort=PortNumber [_value=6640], _remoteIp=IpAddress [_ipv4Address=Ipv4Address [_value=10.8.125.231]], _remotePort=PortNumber [_value=45016], augmentation=[]]
2018-08-22T23:00:09,551 | INFO | nioEventLoopGroup-5-7 | OvsdbConnectionManager | 393 - org.opendaylight.ovsdb.southbound-impl - 1.6.4.SNAPSHOT | Library disconnected PASSIVE from /10.8.125.231:45016 to /10.8.125.230:6640. Cleaning up the operational data store
2018-08-22T23:00:09,551 | INFO | nioEventLoopGroup-5-7 | OvsdbConnectionManager | 393 - org.opendaylight.ovsdb.southbound-impl - 1.6.4.SNAPSHOT | Library disconnected ConnectionInfo [_localIp=IpAddress [_ipv4Address=Ipv4Address [_value=10.8.125.230]], _localPort=PortNumber [_value=6640], _remoteIp=IpAddress [_ipv4Address=Ipv4Address [_value=10.8.125.231]], _remotePort=PortNumber [_value=45016], augmentation=[]] this controller instance has ownership

Connection is really there between Controller and OVS switch
===============================================

[stack@control log]$ netstat -punta|grep 6640
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 10.8.125.230:33252 10.8.125.230:6640 ESTABLISHED -
tcp6 0 0 :::6640 :::* LISTEN 10732/java
tcp6 0 0 10.8.125.230:6640 10.8.125.231:34176 ESTABLISHED 10732/java
tcp6 0 0 10.8.125.230:6640 10.8.125.230:33252 ESTABLISHED 10732/java
tcp6 0 0 10.8.125.230:6640 10.8.125.232:59698 ESTABLISHED 10732/java

Last login: Thu Aug 23 00:45:32 2018 from 192.168.121.1
[vagrant@compute1 ~]$ sudo netstat -punta |grep 6640
tcp 0 0 10.8.125.231:34176 10.8.125.230:6640 ESTABLISHED 794/ovsdb-server

Comment by Victor Pickard [ 23/Aug/18 ]

https://git.opendaylight.org/gerrit/#/c/75430/

Comment by Victor Pickard [ 23/Aug/18 ]

For reference, looks like similiar issue was reported in https://jira.opendaylight.org/browse/OVSDB-439

Corresponding patch:
https://git.opendaylight.org/gerrit/#/c/71303/

Generated at Wed Feb 07 20:36:25 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.