[NETVIRT-1295] Transaction leak in ExternalRoutersListener Created: 04/Jun/18  Updated: 22/Jan/20  Resolved: 22/Jan/20

Status: Resolved
Project: netvirt
Component/s: natservice
Affects Version/s: Fluorine
Fix Version/s: Fluorine-SR2, Neon

Type: Bug Priority: Medium
Reporter: Michael Vorburger Assignee: Karthikeyan Krishnan
Resolution: Done Votes: 0
Labels: feedback
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Blocks
blocks NETVIRT-1089 Add trace:transactions to suite teard... In Progress

 Description   

NETVIRT-1089 found these in natservice, all of which need to be (separately) plugged:

DataBroker : newReadOnlyTransaction() 28x transactions opened here, which are not closed: (...) org.opendaylight.controller.md.sal.binding.impl.BindingDOMDataBrokerAdapter.newReadOnlyTransaction(BindingDOMDataBrokerAdapter.java:59) (...) org.opendaylight.netvirt.natservice.internal.ExternalRoutersListener.read(ExternalRoutersListener.java:573) org.opendaylight.netvirt.natservice.internal.ExternalRoutersListener.subnetRegisterMapping(ExternalRoutersListener.java:422) org.opendaylight.netvirt.natservice.internal.NaptSwitchHA.subnetRegisterMapping(NaptSwitchHA.java:1063) org.opendaylight.netvirt.natservice.internal.RouterDpnChangeListener.handleSNATForDPN(RouterDpnChangeListener.java:333) org.opendaylight.netvirt.natservice.internal.RouterDpnChangeListener.lambda$add$0(RouterDpnChangeListener.java:198) org.opendaylight.infrautils.jobcoordinator.internal.JobCoordinatorImpl$MainTask.runWithUncheckedExceptionLogging(JobCoordinatorImpl.java:416)

as well as this:

 2x transactions opened here, which are not closed:
      (...)      org.opendaylight.controller.md.sal.binding.impl.BindingDOMDataBrokerAdapter.newReadOnlyTransaction(BindingDOMDataBrokerAdapter.java:59)
      (...)      org.opendaylight.netvirt.natservice.internal.ExternalRoutersListener.read(ExternalRoutersListener.java:573)      org.opendaylight.netvirt.natservice.internal.ExternalRoutersListener.subnetRegisterMapping(ExternalRoutersListener.java:422)      org.opendaylight.netvirt.natservice.internal.ExternalRoutersListener.handleEnableSnat(ExternalRoutersListener.java:300)      org.opendaylight.netvirt.natservice.internal.ExternalRoutersListener.lambda$add$0(ExternalRoutersListener.java:281)     org.opendaylight.infrautils.jobcoordinator.internal.JobCoordinatorImpl$MainTask.runWithUncheckedExceptionLogging(JobCoordinatorImpl.java:416)

and this:

 DataBroker : newWriteOnlyTransaction()
    59x transactions opened here, which are not closed:
      (...)
      org.opendaylight.controller.md.sal.binding.impl.BindingDOMDataBrokerAdapter.newWriteOnlyTransaction(BindingDOMDataBrokerAdapter.java:69)
      (...)
      org.opendaylight.netvirt.natservice.internal.NatInterfaceStateChangeListener$NatInterfaceStateUpdateWorker.call(NatInterfaceStateChangeListener.java:377)
      org.opendaylight.netvirt.natservice.internal.NatInterfaceStateChangeListener$NatInterfaceStateUpdateWorker.call(NatInterfaceStateChangeListener.java:356)
      org.opendaylight.infrautils.jobcoordinator.internal.JobCoordinatorImpl$MainTask.runWithUncheckedExceptionLogging(JobCoordinatorImpl.java:416)
      org.opendaylight.infrautils.utils.concurrent.LoggingUncaughtThreadDeathContextRunnable.run(LoggingUncaughtThreadDeathContextRunnable.java:60)
      java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
      java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
      java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
      java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
      java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

and:

49x transactions opened here, which are not closed:
      (...)
      org.opendaylight.controller.md.sal.binding.impl.BindingDOMDataBrokerAdapter.newWriteOnlyTransaction(BindingDOMDataBrokerAdapter.java:69)
      (...)
      org.opendaylight.netvirt.natservice.internal.ExternalRoutersListener.lambda$remove$3(ExternalRoutersListener.java:1685)
      org.opendaylight.infrautils.jobcoordinator.internal.JobCoordinatorImpl$MainTask.runWithUncheckedExceptionLogging(JobCoordinatorImpl.java:416)
      org.opendaylight.infrautils.utils.concurrent.LoggingUncaughtThreadDeathContextRunnable.run(LoggingUncaughtThreadDeathContextRunnable.java:60)
      java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
      java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
      java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
      java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
      java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)


 Comments   
Comment by Michael Vorburger [ 05/Jun/18 ]

jluhrsen in chat with skitt he rightfully requested we set the right Affects Version here - are your NETVIRT-1089 CSIT jobs where I copy/pasted above from running on master (Fluorin) or stable/oxygen?

Comment by Jamo Luhrsen [ 05/Jun/18 ]

latest open transactions from a fluorine job:

DataBroker : newReadOnlyTransaction() 28x transactions opened here, which are not closed: (...) org.opendaylight.controller.md.sal.binding.impl.BindingDOMDataBrokerAdapter.newReadOnlyTransaction(BindingDOMDataBrokerAdapter.java:57) (...) org.opendaylight.netvirt.natservice.internal.ExternalRoutersListener.read(ExternalRoutersListener.java:576) org.opendaylight.netvirt.natservice.internal.ExternalRoutersListener.subnetRegisterMapping(ExternalRoutersListener.java:425) org.opendaylight.netvirt.natservice.internal.NaptSwitchHA.subnetRegisterMapping(NaptSwitchHA.java:1063) org.opendaylight.netvirt.natservice.internal.RouterDpnChangeListener.handleSNATForDPN(RouterDpnChangeListener.java:325) org.opendaylight.netvirt.natservice.internal.RouterDpnChangeListener.lambda$null$0(RouterDpnChangeListener.java:194) org.opendaylight.genius.infra.ManagedNewTransactionRunnerImpl.callWithNewWriteOnlyTransactionAndSubmit(ManagedNewTransactionRunnerImpl.java:45) org.opendaylight.netvirt.natservice.internal.RouterDpnChangeListener.lambda$null$1(RouterDpnChangeListener.java:193) org.opendaylight.genius.infra.ManagedNewTransactionRunnerImpl.callWithNewWriteOnlyTransactionAndSubmit(ManagedNewTransactionRunnerImpl.java:45) org.opendaylight.netvirt.natservice.internal.RouterDpnChangeListener.lambda$add$2(RouterDpnChangeListener.java:151) org.opendaylight.infrautils.jobcoordinator.internal.JobCoordinatorImpl$MainTask.runWithUncheckedExceptionLogging(JobCoordinatorImpl.java:401) org.opendaylight.infrautils.utils.concurrent.LoggingUncaughtThreadDeathContextRunnable.run(LoggingUncaughtThreadDeathContextRunnable.java:60) java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402) java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) 2x transactions opened here, which are not closed: (...) org.opendaylight.controller.md.sal.binding.impl.BindingDOMDataBrokerAdapter.newReadOnlyTransaction(BindingDOMDataBrokerAdapter.java:57) (...) org.opendaylight.netvirt.natservice.internal.ExternalRoutersListener.read(ExternalRoutersListener.java:576) org.opendaylight.netvirt.natservice.internal.ExternalRoutersListener.subnetRegisterMapping(ExternalRoutersListener.java:425) org.opendaylight.netvirt.natservice.internal.ExternalRoutersListener.handleEnableSnat(ExternalRoutersListener.java:303) org.opendaylight.netvirt.natservice.internal.ExternalRoutersListener.lambda$null$0(ExternalRoutersListener.java:287) org.opendaylight.genius.infra.ManagedNewTransactionRunnerImpl.callWithNewWriteOnlyTransactionAndSubmit(ManagedNewTransactionRunnerImpl.java:45) org.opendaylight.netvirt.natservice.internal.ExternalRoutersListener.lambda$add$1(ExternalRoutersListener.java:270) org.opendaylight.infrautils.jobcoordinator.internal.JobCoordinatorImpl$MainTask.runWithUncheckedExceptionLogging(JobCoordinatorImpl.java:401) org.opendaylight.infrautils.utils.concurrent.LoggingUncaughtThreadDeathContextRunnable.run(LoggingUncaughtThreadDeathContextRunnable.java:60) java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402) java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) DataBroker : createTransactionChain() 1x TransactionChains opened but not closed here: (...) org.opendaylight.controller.md.sal.binding.impl.BindingDOMTransactionChainAdapter.<init>(BindingDOMTransactionChainAdapter.java:45) org.opendaylight.controller.md.sal.binding.impl.BindingDOMDataBrokerAdapter.createTransactionChain(BindingDOMDataBrokerAdapter.java:72) (...) org.opendaylight.ovsdb.hwvtepsouthbound.transactions.md.TransactionInvokerImpl.<init>(TransactionInvokerImpl.java:60) org.opendaylight.ovsdb.hwvtepsouthbound.HwvtepSouthboundProvider.init(HwvtepSouthboundProvider.java:81) (...) java.util.concurrent.FutureTask.run(FutureTask.java:266) (...) java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) java.util.concurrent.FutureTask.run(FutureTask.java:266) (...) java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) java.util.concurrent.FutureTask.run(FutureTask.java:266) java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) java.lang.Thread.run(Thread.java:748) 3x TransactionChains opened but not closed here: (...) org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransactionChain.<init>(PingPongTransactionChain.java:101) org.opendaylight.controller.md.sal.dom.broker.impl.PingPongDataBroker.createTransactionChain(PingPongDataBroker.java:49) org.opendaylight.controller.md.sal.dom.broker.impl.PingPongDataBroker.createTransactionChain(PingPongDataBroker.java:28) (...) org.opendaylight.controller.md.sal.binding.impl.BindingDOMTransactionChainAdapter.<init>(BindingDOMTransactionChainAdapter.java:45) org.opendaylight.controller.md.sal.binding.impl.BindingDOMDataBrokerAdapter.createTransactionChain(BindingDOMDataBrokerAdapter.java:72) (...) org.opendaylight.openflowplugin.common.txchain.TransactionChainManager.createTxChain(TransactionChainManager.java:81) org.opendaylight.openflowplugin.common.txchain.TransactionChainManager.activateTransactionManager(TransactionChainManager.java:109) org.opendaylight.openflowplugin.impl.device.DeviceContextImpl.lazyTransactionManagerInitialization(DeviceContextImpl.java:686) org.opendaylight.openflowplugin.impl.device.DeviceContextImpl.instantiateServiceInstance(DeviceContextImpl.java:629) org.opendaylight.openflowplugin.impl.lifecycle.GuardedContextImpl.instantiateServiceInstance(GuardedContextImpl.java:86) java.util.concurrent.CopyOnWriteArrayList.forEach(CopyOnWriteArrayList.java:891) org.opendaylight.openflowplugin.impl.lifecycle.ContextChainImpl.instantiateServiceInstance(ContextChainImpl.java:74) org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.lambda$startServices$0(ClusterSingletonServiceGroupImpl.java:648) java.util.ArrayList.forEach(ArrayList.java:1257) org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.startServices(ClusterSingletonServiceGroupImpl.java:645) org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.cleanupCandidateOwnershipChanged(ClusterSingletonServiceGroupImpl.java:506) org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.lockedOwnershipChanged(ClusterSingletonServiceGroupImpl.java:453) org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.ownershipChanged(ClusterSingletonServiceGroupImpl.java:433) org.opendaylight.mdsal.singleton.dom.impl.AbstractClusterSingletonServiceProviderImpl.ownershipChanged(AbstractClusterSingletonServiceProviderImpl.java:234) org.opendaylight.mdsal.singleton.dom.impl.DOMClusterSingletonServiceProviderImpl.ownershipChanged(DOMClusterSingletonServiceProviderImpl.java:23) org.opendaylight.controller.cluster.datastore.entityownership.EntityOwnershipListenerActor.onEntityOwnershipChanged(EntityOwnershipListenerActor.java:44) org.opendaylight.controller.cluster.datastore.entityownership.EntityOwnershipListenerActor.handleReceive(EntityOwnershipListenerActor.java:33) org.opendaylight.controller.cluster.common.actor.AbstractUntypedActor.onReceive(AbstractUntypedActor.java:38) (...) 1x TransactionChains opened but not closed here: (...) org.opendaylight.controller.md.sal.binding.impl.BindingDOMTransactionChainAdapter.<init>(BindingDOMTransactionChainAdapter.java:45) org.opendaylight.controller.md.sal.binding.impl.BindingDOMDataBrokerAdapter.createTransactionChain(BindingDOMDataBrokerAdapter.java:72) (...) org.opendaylight.ovsdb.southbound.transactions.md.TransactionInvokerImpl.<init>(TransactionInvokerImpl.java:54) org.opendaylight.ovsdb.southbound.SouthboundProvider.init(SouthboundProvider.java:89) (...) java.util.concurrent.FutureTask.run(FutureTask.java:266) (...) java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) java.util.concurrent.FutureTask.run(FutureTask.java:266) (...) java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) java.util.concurrent.FutureTask.run(FutureTask.java:266) java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) java.lang.Thread.run(Thread.java:748)
Comment by Jamo Luhrsen [ 05/Jun/18 ]

the below comment was from the fluorine job I ran on 6/5. I don't remember what I used for the original
logs. sorry

Comment by Michael Vorburger [ 05/Jun/18 ]

Thanks; I've set the Affects Version field to Fluorine.

Can I suggest that we treat only the leak in netvirt.natservice here, and ignore the other leaks shown above in this issue.

Comment by Chetan Arakere Gowdru [ 06/Jun/18 ]

Hi All,

I have addressed the issue on ReadTransaction not getting closed in ExternalRouterListener.

https://git.opendaylight.org/gerrit/#/c/72710/

Where I can see these kind of exception when CSIT(test-netvirt-current) is ranned as I didn't observe any such error in below ODL log location.

https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-1node-openstack-queens-gate-stateful-fluorine/549/odl_1/  

Comment by Michael Vorburger [ 06/Jun/18 ]

> Where I can see these kind of exception when CSIT(test-netvirt-current) is run

I'm not sure if and how you could see this on a CSIT for an un-merged changed. If I were you, I wouldn't worry and just get it in... If this is still seen, this JIRA issue will simply get re-openend! But if you do want to check, reach out to jluhrsen re. where you can see the trace:transaction results on a CSIT.

xcheara could you plz cherry-pick c/72710 to stable/oxygen as well (once and after it has +1/2 and merged to master) ? I just had a look and the bad code was already there as well.

Comment by Michael Vorburger [ 11/Jun/18 ]

jluhrsen xcheara I was just discussing this one with skitt who mentioned re. the newWriteOnlyTransaction that they were already using the SingleTransactionDataBroker there (the c/72710/ fix is just for the ReadTransaction), so that's very curious... jluhrsen are we sure that we're doing "trace:transaction" at the "end" in that CIST in NETVIRT-1089, when the system should be "calmed down" and "done" ? I understand this is a vague- but it's of course "normal" for some transactions to still be open while it's still "doing something". Otherwise, I guess what could be happening is that for some reason things are "stuck" inside a job. We would ideally need to see a jstack taken at the same time the "trace:transaction" was taken to be really sure what's going on - would that be possible?

May be we should split this Jira into separate issues for the WriteOnlyTransaction VS the ReadTransaction problems listed above - does someone want to do that? 

Comment by Jamo Luhrsen [ 11/Jun/18 ]

vorburger, we are doing the trace:transaction at the end of every test case, as well as in every suite's teardown stage So,
I'm guessing there will be times where the system is still chugging away at something so we'd end up getting some listing of
transactions open that should be and will be closed shortly after. But, that means someone could go sift through all the test
cases if they wanted to see when a transaction first showed up and see when it eventually went away, if that's the idea.

However, you could also just look at the final suite's teardown which happened aprox 3s after what I think is the last external event
that could possibly tickle things, which is just a router list. And before that, it's just other reads for some time. It was probably
~30s before there was anything other than reads happening (a security group delete happened).

as for a "jstack", where is that done and what are the details for doing it? Maybe I can add it to my patch and run
this job in the sandbox again.

Comment by Chetan Arakere Gowdru [ 19/Jun/18 ]

jluhrsen, based on above logs, we have addressed the closing on

newReadOnlyTransaction(). Will it be possible to ran the suite with this patchset changes - https://git.opendaylight.org/gerrit/#/c/72710/

Comment by Jamo Luhrsen [ 19/Jun/18 ]

xcheara, you want the suite with "trace:transactions" to be run, correct?

Comment by Chetan Arakere Gowdru [ 20/Jun/18 ]

Yes. Basically want to verify no more occurrence of Read Transcations opened with this patch.

Comment by Sam Hague [ 03/Oct/18 ]

xcheara is thsi still being worked?

Comment by Chetan Arakere Gowdru [ 04/Oct/18 ]

Yes Sam. I will rebase this patch and make it ready for review/merge.

Comment by Chetan Arakere Gowdru [ 22/Jan/20 ]

This identified issue been merged.

https://git.opendaylight.org/gerrit/#/c/netvirt/+/72710/

Please veriify if it's still an open issue in magnesium.

Generated at Wed Feb 07 20:23:43 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.