[OPNFLWPLUG-961] New OOM due to more TX leaks seen in the nova-boot scenario Created: 30/Nov/17  Updated: 06/Dec/17  Resolved: 04/Dec/17

Status: Resolved
Project: OpenFlowPlugin
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Highest
Reporter: Michael Vorburger Assignee: Michael Vorburger
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File controller-0_open-transactions.txt     Text File controller-1_open-transactions.txt     Text File controller-2_open-transactions.txt    
Issue Links:
Blocks
blocks NETVIRT-878 CSIT should help to detect possible m... In Progress
is blocked by NETVIRT-985 java.lang.OutOfMemoryError: Java heap... Resolved
Cloners
clones OVSDB-435 New OOM due to more TX leaks seen in ... Resolved

 Description   

Following the (many!) fixes that we did in NETVIRT-985, a new round of testing on the latest stable/carbon using the "nova-boot scenario" has hit an OOM again. A new "trace:transaction" output reveals more TX leaks which we need to plug; see attached.

The biggest one with 273x open newReadOnlyTransaction likely causing the new OOM this time appears in (1) ovsdb; https://git.opendaylight.org/gerrit/#/c/65567/ seems to not have helped for that one - curious... or we made a mess in downstream porting?!

Filing this in JIRA project ovsdb due to above; but the attached shows a number of other leaks in (2) openflowplugin and (3) genius.lockmanager & (4) genius.interfacemanager (x90!) which we need to plug as part of this JIRA as well.



 Comments   
Comment by Michael Vorburger [ 30/Nov/17 ]

https://git.opendaylight.org/gerrit/#/q/topic:OVSDB-435 will have all changes re. this new OOM accross ovsdb, genius and openflowplugin.

Comment by Michael Vorburger [ 30/Nov/17 ]

Extract of openflowplugin related leaks from attachments, WITHOUT the x1 which don't count:

A. something in DeviceContextImpl:

DataBroker : newReadOnlyTransaction()
    5x transactions opened here, which are not closed:
      (...)      org.opendaylight.controller.md.sal.dom.spi.ForwardingDOMDataBroker.newReadOnlyTransaction(ForwardingDOMDataBroker.java:42)
      (...)      org.opendaylight.controller.md.sal.binding.impl.BindingDOMDataBrokerAdapter.newReadOnlyTransaction(BindingDOMDataBrokerAdapter.java:59)
      (...)      org.opendaylight.openflowplugin.impl.device.DeviceContextImpl.getReadTransaction(DeviceContextImpl.java:244)      org.opendaylight.openflowplugin.impl.statistics.StatisticsGatheringUtils.deleteAllKnownFlows(StatisticsGatheringUtils.java:208)      org.opendaylight.openflowplugin.impl.services.AbstractMultipartRequestOnTheFlyCallback.startCollecting(AbstractMultipartRequestOnTheFlyCallback.java:135)      org.opendaylight.openflowplugin.impl.services.AbstractMultipartRequestOnTheFlyCallback.onSuccess(AbstractMultipartRequestOnTheFlyCallback.java:84)      org.opendaylight.openflowplugin.impl.services.AbstractMultipartRequestOnTheFlyCallback.onSuccess(AbstractMultipartRequestOnTheFlyCallback.java:35)      org.opendaylight.openflowjava.protocol.impl.core.connection.OutboundQueueEntry.complete(OutboundQueueEntry.java:121)      org.opendaylight.openflowjava.protocol.impl.core.connection.StackedSegment.completeEntry(StackedSegment.java:113)      org.opendaylight.openflowjava.protocol.impl.core.connection.StackedSegment.pairRequest(StackedSegment.java:153)
...

B. something re Ping Pong from TransactionChainManager:

25x TransactionChains opened but not closed here:
(...)
 org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransactionChain.<init>(PingPongTransactionChain.java:98) org.opendaylight.controller.md.sal.dom.broker.impl.PingPongDataBroker.createTransactionChain(PingPongDataBroker.java:47) org.opendaylight.controller.md.sal.dom.broker.impl.PingPongDataBroker.createTransactionChain(PingPongDataBroker.java:27)
 (...)
 org.opendaylight.controller.md.sal.binding.impl.BindingDOMTransactionChainAdapter.<init>(BindingDOMTransactionChainAdapter.java:45) org.opendaylight.controller.md.sal.binding.impl.BindingDOMDataBrokerAdapter.createTransactionChain(BindingDOMDataBrokerAdapter.java:74)
 (...)
 org.opendaylight.openflowplugin.common.txchain.TransactionChainManager.createTxChain(TransactionChainManager.java:80)
 org.opendaylight.openflowplugin.common.txchain.TransactionChainManager.onTransactionChainFailed(TransactionChainManager.java:278)
 org.opendaylight.controller.md.sal.binding.impl.BindingDOMTransactionChainAdapter.failTransactionChain(BindingDOMTransactionChainAdapter.java:110)
 org.opendaylight.controller.md.sal.binding.impl.BindingDOMTransactionChainAdapter.access$200(BindingDOMTransactionChainAdapter.java:31)
 org.opendaylight.controller.md.sal.binding.impl.BindingDOMTransactionChainAdapter$3.onFailure(BindingDOMTransactionChainAdapter.java:91)
 com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)
 com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
 com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
 com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
 com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
 com.google.common.util.concurrent.SettableFuture.setException(SettableFuture.java:68)
 org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransaction.onFailure(PingPongTransaction.java:65)
 org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransactionChain.transactionFailed(PingPongTransactionChain.java:293)
 org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransactionChain$2.onFailure(PingPongTransactionChain.java:243)
 com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 java.lang.Thread.run(Thread.java:748)
Comment by Michael Vorburger [ 30/Nov/17 ]

skitt I had my (very very first, actually) look inside openflowplugin today, and .. find an entirely different way of writing code than in genius and netvirt in there; they have their own set up TX utils! Addressing isn't the 1h job I thought it would be.

A. is easy to fix (if it is even a real leak...), see https://git.opendaylight.org/gerrit/#/c/66128/ & Co.

But for B. I don't yet see how to easily address the leaks clearily found above. As these are minor leaks, compared to the other big ones we've seen, I am not planning to address this in short term. I'll email openflowplug-dev, but don't really expect them to fix this for us in the short term.

PS: In the longer term, and with more effort, what could perhaps be interesting is to start exploring if openflowplugin could be changed to use the TX Utils we've now come up with in genius, like our ManagedNewTransactionRunner?

Generated at Wed Feb 07 20:33:50 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.