[BGPCEP-486] PingPongTransaction race when handling data tree change Created: 24/Jun/16 Updated: 03/Mar/19 Resolved: 15/Aug/16 |
|
| Status: | Resolved |
| Project: | bgpcep |
| Component/s: | BGP |
| Affects Version/s: | Bugzilla Migration |
| Fix Version/s: | Bugzilla Migration |
| Type: | Bug | ||
| Reporter: | Kevin Wang | Assignee: | Kevin Wang |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| External issue ID: | 6111 | ||||||||||||
| Description |
|
Race is observed 2016-06-24 17:55:24,877 | ERROR | lt-dispatcher-36 | DataTreeChangeListenerActor | 149 - org.opendaylight.controller.sal-distributed-datastore - 1.3.0.Beryllium | Error notifying listener org.opendaylight.controller.md.sal.binding.impl.BindingDOMDataTreeChangeListenerAdapter@2cf0c767 raced with transacion PingPongTransaction {delegate=org.opendaylight.controller.cluster.databroker.DOMBrokerReadWriteTransaction@f493600} at org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransactionChain.slowAllocateTransaction(PingPongTransactionChain.java:136)[122:org.opendaylight.controller.sal-broker-impl:1.3.0.Beryllium] |
| Comments |
| Comment by Milos Fabian [ 27/Jun/16 ] |
|
Could you please add steps to reproduce or at least some more info when this issue happens? Optionally, could you try it with a latest Beryllium release (SR2)? Thanks, |
| Comment by Kevin Wang [ 11/Jul/16 ] |
|
Cannot reproduce for now. Will reopen if more information about this bug is observed. |
| Comment by Kevin Wang [ 20/Jul/16 ] |
|
The node is trying to sync the bgp-linkstate topology with other nodes, however, the transaction fails due to timeout. 2016-07-19 17:19:59,376 | ERROR | CommitFutures-10 | AbstractTopologyBuilder | 327 - org.opendaylight.bgpcep.bgp-topology-provider - 0.5.0.Beryllium | Topology builder for KeyedInstanceIdentifier {targetType=interface org.opendaylight.yang.gen.v1.urn.tbd.params.xml.ns.yang.network.topology.rev131021.network.topology.Topology, path=[org.opendaylight.yang.gen.v1.urn.tbd.params.xml.ns.yang.network.topology.rev131021.NetworkTopology, org.opendaylight.yang.gen.v1.urn.tbd.params.xml.ns.yang.network.topology.rev131021.network.topology.Topology[key=TopologyKey [_topologyId=Uri [_value=CBB-linkstate-topology-3]]]]} failed in transaction DOM-CHAIN-12-333 at org.opendaylight.controller.md.sal.dom.broker.impl.TransactionCommitFailedExceptionMapper.newWithCause(TransactionCommitFailedExceptionMapper.java:37)[111:org.opendaylight.controller.sal-broker-impl:1.3.0.Beryllium] Then in AbstractTopologyBuilder, it tries to cancel the PingPongTransaction, while the cancel fails as PingPongTransactionChain implements it as "Unsupported". 2016-07-19 17:19:59,394 | ERROR | lt-dispatcher-27 | DataTreeChangeListenerActor | 138 - org.opendaylight.controller.sal-distributed-datastore - 1.3.0.Beryllium | Error notifying listener org.opendaylight.controller.md.sal.binding.impl.BindingDOMDataTreeChangeListenerAdapter@2b6c9375 Eventually the transaction is locked and causing new transaction fail to be created. 2016-07-19 17:19:59,395 | ERROR | lt-dispatcher-27 | DataTreeChangeListenerActor | 138 - org.opendaylight.controller.sal-distributed-datastore - 1.3.0.Beryllium | Error notifying listener org.opendaylight.controller.md.sal.binding.impl.BindingDOMDataTreeChangeListenerAdapter@2b6c9375 raced with transacion PingPongTransaction {delegate=org.opendaylight.controller.cluster.databroker.DOMBrokerReadWriteTransaction@2b8e58a3} at org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransactionChain.slowAllocateTransaction(PingPongTransactionChain.java:136)[111:org.opendaylight.controller.sal-broker-impl:1.3.0.Beryllium] |
| Comment by Kevin Wang [ 20/Jul/16 ] |
|
Attachment karaf.log.77.zip has been added with description: karaf log |
| Comment by Claudio David Gasparini [ 21/Jul/16 ] |
|
Hi Kevin, regarding your last comment not sure if the attachment correspond to this bug or not.. Regards |
| Comment by Kevin Wang [ 21/Jul/16 ] |
|
Hi Claudio, Sorry for the unclear description. The last attachment (karaf log) is for this bug. The problem happens in a 3-node cluster. It happened on Beryllium release. But I suppose it will also happen on SR2 or Boron as the related code hasn't changed. The following is my observation: 1. ODL is trying to sync BGP-LS topology with other nodes. 2. ODL tried to get member-2 shard transaction: 3. The shard keeps timing out initially (in the log, it times out on member-2) 4. Eventually, the sync transaction fails due to the previous transaction timed out: 5. Then the BGP topology provider tried to cancel the transaction, however, the cancel() is not implemented for PingPongTransactionChain (this could either be a PingPongTransactionChain bug, that the cancel() should be implemented properly, or a AbstractTopologyBuilder bug, that it should not invoke the cancel() method when a transaction fails): 6. Consequently, all the following new PingPongTransaction cannot be created. Because the previous PingPongTransaction was not properly canceled, the PingPongTransactionChain was never unlocked. All the new PingPongTransaction was said "raced" with this problematic PingPongTransaction. Engineering believes that the data change on BGP-LS will never happen after this. In summary, the AbstractTopologyBuilder, it is trying to cancel a PingPongTransactionChain, which is not supported. That leads to the problem. Please let me know if you have any other thought. |
| Comment by Kevin Wang [ 21/Jul/16 ] |
|
Confirmed with Robert, "PingPongDataBroker does not support cancelling transaction. It really If a submitted transaction fails, the entire TransactionChain becomes |
| Comment by Kevin Wang [ 27/Jul/16 ] |
| Comment by Robert Varga [ 03/Aug/16 ] |
|
PingPong side of fix: https://git.opendaylight.org/gerrit/43058 |
| Comment by Milos Fabian [ 15/Aug/16 ] |
|
stable/boron: https://git.opendaylight.org/gerrit/#/c/43928/ |