Uploaded image for project: 'ovsdb'
  1. ovsdb
  2. OVSDB-444

Port binding failure after rebooting compute node - hostconfig missing

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Medium Medium
    • Fluorine
    • Oxygen-SR4, Fluorine
    • None

      I have a pike/carbon TripleO based setup.

      Spin up a VM on compute node, verify it gets an IP and can ping. All is good.

      Now, delete the VM, reboot the compute node (sudo shutdown -r now).

      Compute node comes up, tunnel is up between compute and control (good stuff). But, when I boot the VM, port binding fails, as shown below.

      It looks like networking_odl sees the agent as "dead". Tried booting the VM again, same issue (was wondering if it took some time to see the agent as "alive").

      This log is generated:

       

       

      2017-11-29 23:52:38.233 56464 WARNING networking_odl.ml2.pseudo_agentdb_binding [req-fde69db7-2e68-4d1b-887c-3d16680cf567 74655be1d3634adc8234db3b9a35d083 231c8ce05e234185ac3db9c98e5eb3bd - default default] Refusing to bind port 13029ea7-
       9821-4796-a1d3-0640a8b14873 to dead pseudo agent: {'binary': u'neutron-odlagent-portbinding', 'description': None, 'admin_state_up': True, 'heartbeat_timestamp': datetime.datetime(2017, 11, 29, 23, 46, 18), 'availability_zone': None, 'ali
       ve': False, 'topic': u'N/A', 'host': u'overcloud-novacompute-0.opnfvlf.org', 'agent_type': u'ODL L2', 'resource_versions': {}, 'created_at': datetime.datetime(2017, 11, 28, 2, 40, 59), 'started_at': datetime.datetime(2017, 11, 28, 2, 40, 
       59), 'id': u'dddc4a58-fee2-4788-b560-433be36f64fd', 'configurations': {u'allowed_network_types': [u'local', u'vlan', u'vxlan', u'gre'], u'bridge_mappings':
      {u'datacentre': u'br-ex'}
      , u'supported_vnic_types': [{u'vnic_type': u'normal', u'v
       if_type': u'ovs', u'vif_details': {}}]}} 
       2017-11-29 23:52:38.233 56464 WARNING networking_odl.ml2.pseudo_agentdb_binding [req-fde69db7-2e68-4d1b-887c-3d16680cf567 74655be1d3634adc8234db3b9a35d083 231c8ce05e234185ac3db9c98e5eb3bd - default default] No ODL hostconfigs for host ove
       rcloud-novacompute-0.opnfvlf.org found in agentdb
      

       

       

      More debugging with input from devs, indicates that hostconfig is missing from ODL.

       

      Here is that query after reboot of compute node.

      [

      root@overcloud-controller-0 neutron]# curl -s -u admin:admin -X GET http://${CIP}:8181/restconf/operational/neutron:neutron/hostconfigs/ | python -m json.tool 
       { 
       "hostconfigs": { 
       "hostconfig": [ 
       { 
       "config": "{ \"supported_vnic_types\": [{ \"vnic_type\": \"normal\", \"vif_type\": \"ovs\", \"vif_details\": {} }], \"allowed_network_types\": [\"local\",\"vlan\",\"vxlan\",\"gre\"], \"bridge_mappings\": {\"da
       tacentre\":\"br-ex\"}}", 
       "host-id": "overcloud-controller-0.opnfvlf.org", 
       "host-type": "ODL L2" 
       } 
       ] 
       } 
       } 
       [root@overcloud-controller-0 neutron]#

       

      One last data point. I went to the compute node, did a del-manager, then set-manager, and the compute node is now in the host config, and I can boot VMs on the compute node again.

      
      

      (overcloud) [heat-admin@overcloud-controller-0 log]$ curl -s -u admin:admin -X GET http://${CIP}:8181/restconf/operational/neutron:neutron/hostconfigs/ | python -m json.tool
      {
      "hostconfigs": {
      "hostconfig": [
      {
      "config": "{ \"supported_vnic_types\": [{ \"vnic_type\": \"normal\", \"vif_type\": \"ovs\", \"vif_details\": {} }], \"allowed_network_types\": [\"local\",\"vlan\",\"vxlan\",\"gre\"], \"bridge_mappings\": {\"da
      tacentre\":\"br-ex\"}}",
      "host-id": "overcloud-controller-0.opnfvlf.org",
      "host-type": "ODL L2"
      },
      {
      "config": "{ \"supported_vnic_types\": [{ \"vnic_type\": \"normal\", \"vif_type\": \"ovs\", \"vif_details\": {} }], \"allowed_network_types\": [\"local\",\"vlan\",\"vxlan\",\"gre\"], \"bridge_mappings\": {\"da
      tacentre\":\"br-ex\"}}",
      "host-id": "overcloud-novacompute-0.opnfvlf.org",
      "host-type": "ODL L2"
      }
      ]
      }
      }
      (overcloud) [heat-admin@overcloud-controller-0 log]${noformat}
       

      From Moshe, another workaound is to restart opendaylight service.

       

      Isaku's email describing some relevant behavior:

      Right now Ran Xiao has reported a bug related to hostconfig.
      We're looking into it. Due to the issue, hostconfig entry may not be
      created properly. It can be diagnosed by looking the log of ODL.
      https://lists.opendaylight.org/pipermail/neutron-dev/2017-November/001472.html
      https://jira.opendaylight.org/browse/NEUTRON-151

      When compute node is shutdown, hostconfig entry in MD-SAL is deleted as ovsdb
      connection is deleted. Then entry in neutron agentdb is deleted, thus
      port binding on the shutdown compute node will fail. This is
      intentional behavior as Daya pointed it out.

      Then, suppose the compute node is back. When port binding occurs and agentdb
      entry is not found or the entry is not alive, networking-odl tries to
      refresh agentdb by retrieving hostconfig of the compute node from ODL
      to update agentdb with new one.
      This logic is introduced from Pike release. Before Pike, periodic update
      was used. 30 sec polling.

      If hostconfig in MD-SAL isn't populated after rebooting compude node,
      it sounds like a bug somewhere in ODL. Probably ODL Neutron Northbound.
      I'm quite happy to look into it.
      I would look at MD-SAL hostconfig, and then topology to check if
      corresponding ovsdb entry exists.

            vpickard Victor Pickard
            vpickard Victor Pickard
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: