-
Bug
-
Resolution: Cannot Reproduce
-
None
-
Carbon
-
None
-
Operating System: All
Platform: All
-
9064
-
Highest
Description of problem:
On running low stress longevity tests using Browbeat+Rally (creating 40 neutron resources 2 at a time and deleting them, over and over again), in a clustered ODL setup, ODL on controller-1 hits OOM after about 42 hours into the test. ODL on controller-2 is functional at that point but ODL on controller-0 seems to be running and ports are up but is non-functional (see BZ https://bugzilla.redhat.com/show_bug.cgi?id=1486060). When ODL on controller-0 is restarted to make it functional again at around 16:01 UTC 08/28/2017, ODL on controller-2 dies at around 16:04 UTC 08/28/2017. ODL on controller-1 which hit OOM is left alone.
Here we can see the karaf process count going to 0 on controller-2 around 16:04 UTC 08/28/2017: https://snapshot.raintank.io/dashboard/snapshot/chxdQkhAw3X8l9LS2HNNzCZGQHQvWubO?orgId=2
The heap is dumped before the process dies, however it can be clearly seen that the 2G heap is not reached here: https://snapshot.raintank.io/dashboard/snapshot/RMuDksXZ61ql2kMA47wqBHUXQeYWG05g?orgId=2
Max heap used is around 1.4G
Setup:
3 ODLs
3 OpenStack Controllers
3 Compute nodes
ODL RPM from upstream: python-networking-odl-11.0.0-0.20170806093629.2e78dca.el7ost.noarch
Test:
Create 40 neutron resources (rotuers, networks etc) 2 at a time using Rally and delete them over and over again. This is a long running low stress test.
Additional info:
ODL Controller-0 Logs:
http://8.43.86.1:8088/smalleni/karaf-controller-0.log.tar.gz
ODL Controller-1 Logs:
http://8.43.86.1:8088/smalleni/karaf-controller-1.log.tar.gz
http://8.43.86.1:8088/smalleni/karaf-controller-1-rollover.log.tar.gz
ODL Controller-2 Logs:
http://8.43.86.1:8088/smalleni/karaf-controller-2.log.tar.gz
- is blocked by
-
CONTROLLER-1755 RaftActor lastApplied index moves backwards
- Resolved
-
CONTROLLER-1756 OOM due to huge Map in ShardDataTree
- Resolved