-
Bug
-
Resolution: Done
-
Medium
-
None
-
None
-
Normal
Some ports are not cleaned up during our tempest suite and the suite teardown
is unable to delete the external network which has these left over ports.
Functionally, I think all the tempest test cases are passing for the most
part. We are only catching this because of our suite teardown which is
failing to clean up. This is probably one of our resource leaks others
have found from time to time.
I can't find the smoking gun, and it could be coming from multiple tempest
test cases, but I think we are getting left overs even from the tempest.api.network
group of tests. Now that we are collecting debug logs at the teardown()
portion of each tempest.scenario test we can see a list of the ports that
exist. If we compare a passing job [0] we can see there are less ports
from the failing job [1] in the first case after tempest.api.network. Then
you can see at the very end of those jobs that our passing case only has 4
ports in use (I'm assuming those are the ports being used by that one last
tempest scenario case). But, in the failing job, you can see 10 ports are
still in there, so proabably we've leaked 6 ports at that point.
I saw an update_port_precommit exception when digging around:
_*
2017-11-20 10:14:20.192 16601 DEBUG neutron.plugins.ml2.managers [req-0e73cc7b-709c-422d-a3cd-50e84233e2fc - -] DB exception raised by Mechanism driver 'opendaylight_v2' in update_port_precommit _call_on_drivers /opt/stack/neutron/neutron/plugins/ml2/managers.py:433 2017-11-20 10:14:20.192 16601 ERROR neutron.plugins.ml2.managers Traceback (most recent call last): 2017-11-20 10:14:20.192 16601 ERROR neutron.plugins.ml2.managers File /opt/stack/neutron/neutron/plugins/ml2/managers.py, line 426, in _call_on_drivers 2017-11-20 10:14:20.192 16601 ERROR neutron.plugins.ml2.managers getattr(driver.obj, method_name)(context) 2017-11-20 10:14:20.192 16601 ERROR neutron.plugins.ml2.managers File /usr/lib/python2.7/site-packages/oslo_log/helpers.py, line 67, in wrapper 2017-11-20 10:14:20.192 16601 ERROR neutron.plugins.ml2.managers return method(*args, **kwargs) 2017-11-20 10:14:20.192 16601 ERROR neutron.plugins.ml2.managers File /opt/stack/networking-odl/networking_odl/ml2/mech_driver_v2.py, line 139, in update_port_precommit 2017-11-20 10:14:20.192 16601 ERROR neutron.plugins.ml2.managers context, odl_const.ODL_PORT, odl_const.ODL_UPDATE) 2017-11-20 10:14:20.192 16601 ERROR neutron.plugins.ml2.managers File /opt/stack/networking-odl/networking_odl/ml2/mech_driver_v2.py, line 109, in _record_in_journal 2017-11-20 10:14:20.192 16601 ERROR neutron.plugins.ml2.managers ml2_context=context) 2017-11-20 10:14:20.192 16601 ERROR neutron.plugins.ml2.managers File /opt/stack/networking-odl/networking_odl/journal/journal.py, line 121, in record 2017-11-20 10:14:20.192 16601 ERROR neutron.plugins.ml2.managers raise exception.RetryRequest(e) 2017-11-20 10:14:20.192 16601 ERROR neutron.plugins.ml2.managers RetryRequest 2017-11-20 10:14:20.192 16601 ERROR neutron.plugins.ml2.managers
*_
I can also take the mac address of a port I think has leaked and grep the
karaf.log and see that it did get an "add event" but never a "remove event".
Whereas you can see the add and remove in the passing job's karaf.log
[0] https://logs.opendaylight.org/releng/jenkins092/netvirt-csit-1node-openstack-ocata-upstream-stateful-nitrogen/443/log_tempest.html.gz
[1] https://logs.opendaylight.org/releng/jenkins092/netvirt-csit-1node-openstack-pike-upstream-stateful-nitrogen/99/log_tempest.html.gz