[NETVIRT-878] CSIT should help to detect possible memory leaks leading to OOM related to non-closed transactions (and tx chains) early Created: 31/Aug/17 Updated: 22/Jan/20 |
|
| Status: | In Progress |
| Project: | netvirt |
| Component/s: | General |
| Affects Version/s: | Nitrogen |
| Fix Version/s: | Fluorine-SR2, Neon |
| Type: | Improvement | Priority: | Medium |
| Reporter: | Michael Vorburger | Assignee: | Srinivas Rachakonda |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
At the start, before you install netvirt odl-netvirt-openstack, if you could add a "feature:install odl-mdsal-trace" and then after |
| Comments |
| Comment by Michael Vorburger [ 06/Dec/17 ] |
|
I'm not 100% sure if on master after we're fully through with |
| Comment by Michael Vorburger [ 08/Jan/18 ] |
|
jluhrsen has now started work related to this on Gerrit topic feature-install. |
| Comment by Jamo Luhrsen [ 08/Jan/18 ] |
|
It looks like the changes to install this mdsal trace and then netvirt afterward are causing some Essentially, the high level problem is that openstack instances are not getting ips. Digging a little deeper it doesn't look like any ideas?
|
| Comment by Michael Vorburger [ 08/Jan/18 ] |
|
jluhrsen as per private IRC chat, more as a FTR for myself: That job (above) is for nitrogen, but the trace:transactions is not available for Nitrogen anyway (I've never back patched it). I therefore propose to focus this only on Oxygen - even if your first phase isn't even about trace:transactions just yet, it will later have to be, so might as well focus only on one release. The only thing in the Karaf log of your Nitrogen CSIT job for this is this error: 2018-01-08 19:00:27,797 | WARN | ender-2-thread-1 | AbstractLifeCycle | 282 - org.eclipse.jetty.util - 9.2.21.v20170120 | FAILED HttpServiceContext ... httpContext=WebAppHttpContext ... org.opendaylight.neutron.northbound-api - 374 ... java.lang.NoClassDefFoundError: javax/ws/rs/ext/MessageBodyReader java.lang.NoClassDefFoundError: javax/ws/rs/ext/MessageBodyReader perhaps that is causing the 404 you wrote on IRC you saw in neutron log: DEBUG networking_odl.common.client [-] Exception from ODL: 404 Client Error: Not Found for url: http://10.30.170.91:8080/controller/nb/v2/neutron/ports If we are seeing this NoClassDefFoundError on Oxygen master as well, and if we are only seeing it when you first feature:install the odl-mdsal-trace then we would have to dig into that... but the easiest then is probably just to wait for skitt's fix for |
| Comment by Jamo Luhrsen [ 09/Jan/18 ] |
|
if using featuresBoot, I don't see the problem loading neutron NB API, but it does show up for me even locally doing the I do like having the option of doing feature:install vs featuresBoot in our infra, so no need to stop that work. It's pretty much btw, just to be clear, we did run this job with oxygen and saw the same 14 failures: |
| Comment by Michael Vorburger [ 09/Jan/18 ] |
|
I suggest that in this issue we move forward using featuresBoot instead of feature:install. |
| Comment by Jamo Luhrsen [ 21/Feb/18 ] |
|
with this builder patch to install odl-mdsal-trace as an additional feature, and running the l2 connectivity suite with this int/test patch we get the following output:
what can we make of this so far? I think we can probably merge the jjb and test |
| Comment by Michael Vorburger [ 26/Feb/18 ] |
|
> what can we make of this so far? a real possible transaction leak in openflowplugin which could lead to OOM on longevity, or a false positive of the TracingBroker (trace:transaction) .. watch > I think we can probably merge the jjb and test confirming this, that is the goal here (and we should keep this issue open until we have achieved that). But I would start merging things gradually, and not wait for the full solution, but have as a goal to enforce failing CSIT when we see any output from trace:transactions after test run within a reasonable time frame - couple of weeks, perhaps? But a question: Is |
| Comment by Sam Hague [ 06/Apr/18 ] |
| Comment by Jamo Luhrsen [ 06/Apr/18 ] |
yeah, I need to finish this. We had it in for a bit, but it caused the logging |
| Comment by Michael Vorburger [ 06/Apr/18 ] |
|
shague yeah please don't close this one, this is NOT done - and we really have to finally do this!! |
| Comment by Jamo Luhrsen [ 07/Apr/18 ] |
|
Thanks for confirming what I said, vorburger. Just FYI, this is NOT at the top of my to-do list. I DID get |
| Comment by Jamo Luhrsen [ 07/Apr/18 ] |
|
actually, I forgot about this patch. Running it in the sandbox |
| Comment by Jamo Luhrsen [ 09/Apr/18 ] |
|
the recent two sandbox jobs I ran with this CSIT patch and odl-mdsal-trace feature have failures which we two sandbox failures: I'll run it a few more times, as that's not really any work; just a few clicks. |
| Comment by Michael Vorburger [ 25/Jul/18 ] |
|
We really should pick this old and long overdue idea up and finish it, somehow. In parallel (and with a for me higher short term priority) we will also finish NETVIRT-1318 and start GENIUS-176. |
| Comment by Sam Hague [ 02/Oct/18 ] |
|
jluhrsen did we want to get this in? |
| Comment by Jamo Luhrsen [ 02/Oct/18 ] |
|
yes, but we need two things. 1) run csit with this trace:transactions stuff going to verify no weird failures. In the past, it was causing 2) modify the step to fail when too many open transactions are there. |
| Comment by Abhinav Gupta [ 25/Nov/19 ] |
|
any update here? |
| Comment by Jamo Luhrsen [ 25/Nov/19 ] |
|
the work went in to add "trace:transactions" to each test case teardown, but it looks like it's not working now, but maybe the feature is not installed properly any more? We'd need to find someone to own this jira now and take it forward. once that command is working again, the next step would be to understand if extra transactions are not being closed and then mark the |
| Comment by Abhinav Gupta [ 27/Nov/19 ] |
|
Nishchya, can you please look into this? |
| Comment by Nishchya Gupta [ 22/Jan/20 ] |
|
Hi Srini, please checkif it still valid. |
| Comment by Jamo Luhrsen [ 22/Jan/20 ] |
|
this jira is not about fixing a bug that you need to check if it's still valid or not. It's about doing the work to add functionality to existing CSIT code |