[CONTROLLER-1760] Tooling to find the real root cause culprit of memory leaks related to non-closed transactions (and tx chains) Created: 28/Aug/17  Updated: 15/Nov/17  Resolved: 04/Sep/17

Status: Resolved
Project: controller
Component/s: mdsal
Affects Version/s: Carbon
Fix Version/s: None

Type: Improvement
Reporter: Michael Vorburger Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issue Links:
Blocks
blocks NETVIRT-878 CSIT should help to detect possible m... In Progress
blocks CONTROLLER-1756 OOM due to huge Map in ShardDataTree Resolved
blocks CONTROLLER-1765 trace:transactions should suppress fa... Resolved
blocks NETVIRT-883 Umbrella parent issue for grouping al... Resolved
is blocked by CONTROLLER-1764 Karaf 4: odl-mdsal-trace cannot "just... Resolved

 Description   

CONTROLLER-1756 (and derived OPNFLWPLUG-933) made me realize that we really do need some sort of "tooling" to be able to find the real root cause culprit of memory leaks related to non-closed transactions (and tx chains) ...

Originally, I thought that one can find this in HPROF heap dumps, but thinking this through further after Tom's reply on https://lists.opendaylight.org/pipermail/mdsal-dev/2017-August/001323.html and a comment from Robert on IRC, I realized that of course you actually cannot - the "left over" (non closed) transactions piling up on the heap obviously don't have any trace anymore to where they where originally created - thus where the origin of the leak is, that must be bug fixed, is "lost", at that point.

It turns out that the mdsal-trace feature in controller already has a useful basis for such a feature (which is a debug utility, not something we would want in production), and I've started to work on this in:

As this currently stand, there's a chance (hope) that OPNFLWPLUG-933 fixes the root cause for CONTROLLER-1756 - but if we have this utility, we can be sure that is the one, and we can find other non-closed transaction leaks, today and in the future. Therefore I'd like to create this tool independent of CONTROLLER-1756 and OPNFLWPLUG-933 (and am thus creating this new separate bugzilla issue).



 Comments   
Comment by Michael Vorburger [ 29/Aug/17 ]

> I've started to work on this in:
> * more coming this week

==> https://git.opendaylight.org/gerrit/#/q/topic:bug/9060

Comment by Michael Vorburger [ 29/Aug/17 ]

hello, world - so is this KEWL, or WHAT... check out the output below from this new trace:transactions topic:bug/9060 tool!

So I can find 5 transaction (thus memory) leaks JUST by installing odl-netvirt-openstack (master), and not doing ANYTHING yet... looks like our code is like Swiss Cheese, with many holes in it! It will be so much FUN to have this used e.g. in a CSIT and later during real scale testing.. and, likely, find many many more like this:

opendaylight-user@root>trace:transactions
TracingBroker found some not yet (or never..) closed transaction[chain]s!
[NB: If no stack traces are shown below, then enable transaction-debug-context-enabled in mdsaltrace_config.xml]

DataBroker : newReadOnlyTransaction()
1x transactions opened here, which are not closed:
(...)
Proxy52650ca9_3ae6_40f1_9c52_ec2047cecc51.newReadOnlyTransaction(Unknown Source)
org.opendaylight.openflowplugin.applications.topology.manager.FlowCapableTopologyProvider.isFlowTopologyExist(FlowCapableTopologyProvider.java:86)
org.opendaylight.openflowplugin.applications.topology.manager.FlowCapableTopologyProvider.start(FlowCapableTopologyProvider.java:57)
(...)

==> OPNFLWPLUG-934

DataBroker : createTransactionChain()
1x TransactionChains opened but not closed:
(...)
Proxyfaab8ae8_e5e3_4a72_a6ea_4ae4baef08d5.createTransactionChain(Unknown Source)
org.opendaylight.restconf.RestConnectorProvider.start(RestConnectorProvider.java:87)
(...)

==> NETCONF-462

1x TransactionChains opened but not closed:
(...)
Proxy52650ca9_3ae6_40f1_9c52_ec2047cecc51.createTransactionChain(Unknown Source)
org.opendaylight.openflowplugin.applications.topology.manager.OperationProcessor.<init>(OperationProcessor.java:36)

==> OPNFLWPLUG-935

1x TransactionChains opened but not closed:
(...)
Proxy5501955d_71e1_4829_a1c6_db3085042ff1.createTransactionChain(Unknown Source)
org.opendaylight.ovsdb.southbound.transactions.md.TransactionInvokerImpl.<init>(TransactionInvokerImpl.java:53)
org.opendaylight.ovsdb.southbound.SouthboundProvider.init(SouthboundProvider.java:79)
(...)

==> OVSDB-423

1x TransactionChains opened but not closed:
(...)
Proxyd6d64bb3_914d_4db8_9c2a_094278739686.createTransactionChain(Unknown Source)
org.opendaylight.ovsdb.hwvtepsouthbound.transactions.md.TransactionInvokerImpl.<init>(TransactionInvokerImpl.java:61)
org.opendaylight.ovsdb.hwvtepsouthbound.HwvtepSouthboundProvider.init(HwvtepSouthboundProvider.java:78)

==> OVSDB-424

Comment by Michael Vorburger [ 31/Aug/17 ]

Status update: This tool is now available in master. https://git.opendaylight.org/gerrit/#/c/62403/ would be great to have in netvirt master to make it much easier to use e.g. by QA folks (and in order to be able to start talking about integrating it into CSIT, which would be VERY cool).

Next step: I'll now start looking at the feasibility of back patching this stuff from master to stable/carbon, so that we can run it on internal QA environments under load - and use this to help us find the real culprit of CONTROLLER-1756!

Comment by Michael Vorburger [ 31/Aug/17 ]

> have this used e.g. in a CSIT
> start talking about integrating it into CSIT

==> NETVIRT-878

Comment by Michael Vorburger [ 31/Aug/17 ]

I've just (minutes ago) been able to merge the back-patch of new https://bugs.opendaylight.org/show_bug.cgi?id=9060 tooling from master into the upstream stable/carbon branch, and with that consider this issue fully done from my side. We'll be dealing with enhancements in new issues, such as CONTROLLER-1765 and hope to update CONTROLLER-1756 with the findings obtained thanks to this new tool soon (that was the main point of all of this).

Comment by Michael Vorburger [ 15/Nov/17 ]

http://blog2.vorburger.ch/2017/09/how-to-find-transaction-related-memory.html

Generated at Wed Feb 07 19:56:23 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.