Uploaded image for project: 'netvirt'
  1. netvirt
  2. NETVIRT-1019

CSIT Sporadic failures - port leak in tempest suite

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Medium Medium
    • None
    • None
    • General
    • Normal

      Some ports are not cleaned up during our tempest suite and the suite teardown
      is unable to delete the external network which has these left over ports.

      Functionally, I think all the tempest test cases are passing for the most
      part. We are only catching this because of our suite teardown which is
      failing to clean up. This is probably one of our resource leaks others
      have found from time to time.

      I can't find the smoking gun, and it could be coming from multiple tempest
      test cases, but I think we are getting left overs even from the tempest.api.network
      group of tests. Now that we are collecting debug logs at the teardown()
      portion of each tempest.scenario test we can see a list of the ports that
      exist. If we compare a passing job [0] we can see there are less ports
      from the failing job [1] in the first case after tempest.api.network. Then
      you can see at the very end of those jobs that our passing case only has 4
      ports in use (I'm assuming those are the ports being used by that one last
      tempest scenario case). But, in the failing job, you can see 10 ports are
      still in there, so proabably we've leaked 6 ports at that point.

      I saw an update_port_precommit exception when digging around:
      _*

      2017-11-20 10:14:20.192 16601 DEBUG neutron.plugins.ml2.managers [req-0e73cc7b-709c-422d-a3cd-50e84233e2fc - -] DB exception raised by Mechanism driver 'opendaylight_v2' in update_port_precommit _call_on_drivers /opt/stack/neutron/neutron/plugins/ml2/managers.py:433
      2017-11-20 10:14:20.192 16601 ERROR neutron.plugins.ml2.managers Traceback (most recent call last):
      2017-11-20 10:14:20.192 16601 ERROR neutron.plugins.ml2.managers   File /opt/stack/neutron/neutron/plugins/ml2/managers.py, line 426, in _call_on_drivers
      2017-11-20 10:14:20.192 16601 ERROR neutron.plugins.ml2.managers     getattr(driver.obj, method_name)(context)
      2017-11-20 10:14:20.192 16601 ERROR neutron.plugins.ml2.managers   File /usr/lib/python2.7/site-packages/oslo_log/helpers.py, line 67, in wrapper
      2017-11-20 10:14:20.192 16601 ERROR neutron.plugins.ml2.managers     return method(*args, **kwargs)
      2017-11-20 10:14:20.192 16601 ERROR neutron.plugins.ml2.managers   File /opt/stack/networking-odl/networking_odl/ml2/mech_driver_v2.py, line 139, in update_port_precommit
      2017-11-20 10:14:20.192 16601 ERROR neutron.plugins.ml2.managers     context, odl_const.ODL_PORT, odl_const.ODL_UPDATE)
      2017-11-20 10:14:20.192 16601 ERROR neutron.plugins.ml2.managers   File /opt/stack/networking-odl/networking_odl/ml2/mech_driver_v2.py, line 109, in _record_in_journal
      2017-11-20 10:14:20.192 16601 ERROR neutron.plugins.ml2.managers     ml2_context=context)
      2017-11-20 10:14:20.192 16601 ERROR neutron.plugins.ml2.managers   File /opt/stack/networking-odl/networking_odl/journal/journal.py, line 121, in record
      2017-11-20 10:14:20.192 16601 ERROR neutron.plugins.ml2.managers     raise exception.RetryRequest(e)
      2017-11-20 10:14:20.192 16601 ERROR neutron.plugins.ml2.managers RetryRequest
      2017-11-20 10:14:20.192 16601 ERROR neutron.plugins.ml2.managers 
      

      *_

      I can also take the mac address of a port I think has leaked and grep the
      karaf.log and see that it did get an "add event" but never a "remove event".
      Whereas you can see the add and remove in the passing job's karaf.log

      [0] https://logs.opendaylight.org/releng/jenkins092/netvirt-csit-1node-openstack-ocata-upstream-stateful-nitrogen/443/log_tempest.html.gz
      [1] https://logs.opendaylight.org/releng/jenkins092/netvirt-csit-1node-openstack-pike-upstream-stateful-nitrogen/99/log_tempest.html.gz

            jluhrsen Jamo Luhrsen
            jluhrsen Jamo Luhrsen
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: