[OPNFLWPLUG-982] Suspected TransactionChain leak in TransactionChainManager (or fix false positive in TracingBroker) Created: 26/Feb/18 Updated: 12/Apr/18 Resolved: 12/Apr/18 |
|
| Status: | Resolved |
| Project: | OpenFlowPlugin |
| Component/s: | None |
| Affects Version/s: | Oxygen |
| Fix Version/s: | Fluorine |
| Type: | Bug | Priority: | High |
| Reporter: | Michael Vorburger | Assignee: | Stephen Kitt |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Description |
|
NETVIRT-878 brings up This time we need to either fix this, or if we confirm this is not a real issue, then move this issue to the Controller project and change the org.opendaylight.controller.md.sal.trace.dom.impl.TracingBroker to ignore it: TracingBroker found some not yet (or never..) closed transaction[chain]s!
[NB: If no stack traces are shown below, then enable transaction-debug-context-enabled in mdsaltrace_config.xml]
DataBroker : createTransactionChain() 3x TransactionChains opened but not closed here: (...)
org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransactionChain.<init>(PingPongTransactionChain.java:104)
org.opendaylight.controller.md.sal.dom.broker.impl.PingPongDataBroker.createTransactionChain(PingPongDataBroker.java:49) org.opendaylight.controller.md.sal.dom.broker.impl.PingPongDataBroker.createTransactionChain(PingPongDataBroker.java:28) (...)
org.opendaylight.controller.md.sal.binding.impl.BindingDOMTransactionChainAdapter.<init>(BindingDOMTransactionChainAdapter.java:45)
org.opendaylight.controller.md.sal.binding.impl.BindingDOMDataBrokerAdapter.createTransactionChain(BindingDOMDataBrokerAdapter.java:74) (...)
org.opendaylight.openflowplugin.common.txchain.TransactionChainManager.createTxChain(TransactionChainManager.java:81)
org.opendaylight.openflowplugin.common.txchain.TransactionChainManager.activateTransactionManager(TransactionChainManager.java:109)
org.opendaylight.openflowplugin.impl.device.DeviceContextImpl.lazyTransactionManagerInitialization(DeviceContextImpl.java:649)
org.opendaylight.openflowplugin.impl.device.DeviceContextImpl.instantiateServiceInstance(DeviceContextImpl.java:592)
org.opendaylight.openflowplugin.impl.lifecycle.GuardedContextImpl.instantiateServiceInstance(GuardedContextImpl.java:86)
java.util.concurrent.CopyOnWriteArrayList.forEach(CopyOnWriteArrayList.java:891)
org.opendaylight.openflowplugin.impl.lifecycle.ContextChainImpl.instantiateServiceInstance(ContextChainImpl.java:74)
org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.lambda$startServices$0(ClusterSingletonServiceGroupImpl.java:648)
java.util.ArrayList.forEach(ArrayList.java:1257)
org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.startServices(ClusterSingletonServiceGroupImpl.java:645)
org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.cleanupCandidateOwnershipChanged(ClusterSingletonServiceGroupImpl.java:506)
org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.lockedOwnershipChanged(ClusterSingletonServiceGroupImpl.java:453)
org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.ownershipChanged(ClusterSingletonServiceGroupImpl.java:433)
org.opendaylight.mdsal.singleton.dom.impl.AbstractClusterSingletonServiceProviderImpl.ownershipChanged(AbstractClusterSingletonServiceProviderImpl.java:234)
org.opendaylight.mdsal.singleton.dom.impl.DOMClusterSingletonServiceProviderImpl.ownershipChanged(DOMClusterSingletonServiceProviderImpl.java:23)
org.opendaylight.controller.cluster.datastore.entityownership.EntityOwnershipListenerActor.onEntityOwnershipChanged(EntityOwnershipListenerActor.java:44)
org.opendaylight.controller.cluster.datastore.entityownership.EntityOwnershipListenerActor.handleReceive(EntityOwnershipListenerActor.java:33)
org.opendaylight.controller.cluster.common.actor.AbstractUntypedActor.onReceive(AbstractUntypedActor.java:38) (...)
|
| Comments |
| Comment by Anil Vishnoi [ 06/Mar/18 ] |
|
skitt Stephen, are you looking at this issue ? If not, please unassign so we can triage it. |
| Comment by Stephen Kitt [ 06/Mar/18 ] |
|
I am indeed working on it, Avishnoi; I get the impression there’s a bunch of “synchronisation by magic” (aka GuardedBy statements with no corresponding reality) which could explain contribute to this, although it could also be perfectly innocent (if three devices are connected, there should be three outstanding transaction chains, right?). |
| Comment by Michael Vorburger [ 03/Apr/18 ] |
|
Avishnoi and skitt I meant to speak to you F2F last week at ONS about this, but forgot.. can we move on this? |
| Comment by Anil Vishnoi [ 06/Apr/18 ] |
|
skitt This class was reworked meany times to take care of mutliple race condition. Plugin creates TransactionChainManager per device and it is used in various places in openflowplugin to process the southbound events coming from device like (1) Flow/Group/Port/Queue/Table statistics --> they all are separate events and can come in any order and in parallel. Even Flow/Group statistics are divided in multiple openflow messages and each message end up generating once transaction. (2) Port status messages --> these are asyn messages and can be send anytime from switch if any port goes down. (3) Flow removed messages –> Async message send by switch whenever flow is removed from switch. (4) I may be forgetting one or two more events.
So although TransactionChainManager instance is created per node, it needs to deal with the concurrency because of these various parallel events processing and hence "GuardedBy". But i would love someone to review the synchronization in this class and push a patch if it can be handle in better way. Regarding the open transaction chain, given that openflowplugin create TransactionChainManager instance per device, it opens the transaction chain per node after device is initialized properly. I believe that's the reason you see these open transaction chains. I believe this can be further optimized by creating the transaction chain lazily, but as far as i can see this is not really a leak as such.
|
| Comment by Michael Vorburger [ 11/Apr/18 ] |
|
skitt if you agree with Avishnoi then let us close this? I would then implement |
| Comment by Stephen Kitt [ 12/Apr/18 ] |
|
Yes, I agree. After careful analysis, there doesn’t seem to be anything wrong in the TransactionChainManager, although I still have an upcoming patch to clarify things and help out static analysis tools. |
| Comment by Stephen Kitt [ 12/Apr/18 ] |
|
This is a false positive. |