[OPNFLWPLUG-982] Suspected TransactionChain leak in TransactionChainManager (or fix false positive in TracingBroker) Created: 26/Feb/18  Updated: 12/Apr/18  Resolved: 12/Apr/18

Status: Resolved
Project: OpenFlowPlugin
Component/s: None
Affects Version/s: Oxygen
Fix Version/s: Fluorine

Type: Bug Priority: High
Reporter: Michael Vorburger Assignee: Stephen Kitt
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Blocks
blocks NETVIRT-878 CSIT should help to detect possible m... In Progress

 Description   

NETVIRT-878 brings up OPNFLWPLUG-940 (or was it OPNFLWPLUG-961 or OPNFLWPLUG-933 ?) again:

This time we need to either fix this, or if we confirm this is not a real issue, then move this issue to the Controller project and change the org.opendaylight.controller.md.sal.trace.dom.impl.TracingBroker to ignore it:

TracingBroker found some not yet (or never..) closed transaction[chain]s!
[NB: If no stack traces are shown below, then enable transaction-debug-context-enabled in mdsaltrace_config.xml]
 
DataBroker : createTransactionChain() 3x TransactionChains opened but not closed here: (...) 
    org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransactionChain.<init>(PingPongTransactionChain.java:104) 
    org.opendaylight.controller.md.sal.dom.broker.impl.PingPongDataBroker.createTransactionChain(PingPongDataBroker.java:49)    org.opendaylight.controller.md.sal.dom.broker.impl.PingPongDataBroker.createTransactionChain(PingPongDataBroker.java:28) (...) 
    org.opendaylight.controller.md.sal.binding.impl.BindingDOMTransactionChainAdapter.<init>(BindingDOMTransactionChainAdapter.java:45) 
    org.opendaylight.controller.md.sal.binding.impl.BindingDOMDataBrokerAdapter.createTransactionChain(BindingDOMDataBrokerAdapter.java:74) (...) 
    org.opendaylight.openflowplugin.common.txchain.TransactionChainManager.createTxChain(TransactionChainManager.java:81) 
    org.opendaylight.openflowplugin.common.txchain.TransactionChainManager.activateTransactionManager(TransactionChainManager.java:109) 
    org.opendaylight.openflowplugin.impl.device.DeviceContextImpl.lazyTransactionManagerInitialization(DeviceContextImpl.java:649) 
    org.opendaylight.openflowplugin.impl.device.DeviceContextImpl.instantiateServiceInstance(DeviceContextImpl.java:592) 
    org.opendaylight.openflowplugin.impl.lifecycle.GuardedContextImpl.instantiateServiceInstance(GuardedContextImpl.java:86) 
    java.util.concurrent.CopyOnWriteArrayList.forEach(CopyOnWriteArrayList.java:891) 
    org.opendaylight.openflowplugin.impl.lifecycle.ContextChainImpl.instantiateServiceInstance(ContextChainImpl.java:74) 
    org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.lambda$startServices$0(ClusterSingletonServiceGroupImpl.java:648)
    java.util.ArrayList.forEach(ArrayList.java:1257) 
    org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.startServices(ClusterSingletonServiceGroupImpl.java:645) 
    org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.cleanupCandidateOwnershipChanged(ClusterSingletonServiceGroupImpl.java:506) 
    org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.lockedOwnershipChanged(ClusterSingletonServiceGroupImpl.java:453) 
    org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.ownershipChanged(ClusterSingletonServiceGroupImpl.java:433) 
    org.opendaylight.mdsal.singleton.dom.impl.AbstractClusterSingletonServiceProviderImpl.ownershipChanged(AbstractClusterSingletonServiceProviderImpl.java:234) 
    org.opendaylight.mdsal.singleton.dom.impl.DOMClusterSingletonServiceProviderImpl.ownershipChanged(DOMClusterSingletonServiceProviderImpl.java:23) 
    org.opendaylight.controller.cluster.datastore.entityownership.EntityOwnershipListenerActor.onEntityOwnershipChanged(EntityOwnershipListenerActor.java:44) 
    org.opendaylight.controller.cluster.datastore.entityownership.EntityOwnershipListenerActor.handleReceive(EntityOwnershipListenerActor.java:33) 
    org.opendaylight.controller.cluster.common.actor.AbstractUntypedActor.onReceive(AbstractUntypedActor.java:38) (...)


 Comments   
Comment by Anil Vishnoi [ 06/Mar/18 ]

skitt Stephen, are you looking at this issue ? If not, please unassign so we can triage it.

Comment by Stephen Kitt [ 06/Mar/18 ]

I am indeed working on it, Avishnoi; I get the impression there’s a bunch of “synchronisation by magic” (aka GuardedBy statements with no corresponding reality) which could explain contribute to this, although it could also be perfectly innocent (if three devices are connected, there should be three outstanding transaction chains, right?).

Comment by Michael Vorburger [ 03/Apr/18 ]

Avishnoi and skitt I meant to speak to you F2F last week at ONS about this, but forgot.. can we move on this?

Comment by Anil Vishnoi [ 06/Apr/18 ]

skitt This class was reworked meany times to take care of mutliple race condition. Plugin creates TransactionChainManager per device and it is used in various places in openflowplugin to process the southbound events coming from device like

(1) Flow/Group/Port/Queue/Table statistics --> they all are separate events and can come in any order and in parallel. Even Flow/Group statistics are divided in multiple openflow messages and each message end up generating once transaction. 

(2) Port status messages --> these are asyn messages and can be send anytime from switch if any port goes down.

(3) Flow removed messages –> Async message send by switch whenever flow is removed from switch.

(4) I may be forgetting one or two more events.

 

So although TransactionChainManager instance is created per node, it needs to deal with the concurrency because of these various parallel events processing and hence "GuardedBy". But i would love someone to review the synchronization in this class and push a patch if it can be handle in better way.

Regarding the open transaction chain, given that openflowplugin create TransactionChainManager instance per device, it opens the transaction chain per node after device is initialized properly.  I believe that's the reason you see these open transaction chains. I believe this can be further optimized by creating the transaction chain lazily, but as far as i can see this is not really a leak as such. 

 

Comment by Michael Vorburger [ 11/Apr/18 ]

skitt if you agree with Avishnoi then let us close this?

I would then implement CONTROLLER-1765 and then NETVIRT-878 could use that.

Comment by Stephen Kitt [ 12/Apr/18 ]

Yes, I agree. After careful analysis, there doesn’t seem to be anything wrong in the TransactionChainManager, although I still have an upcoming patch to clarify things and help out static analysis tools.

Comment by Stephen Kitt [ 12/Apr/18 ]

This is a false positive.

Generated at Wed Feb 07 20:33:53 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.