[BGPCEP-486] PingPongTransaction race when handling data tree change Created: 24/Jun/16  Updated: 03/Mar/19  Resolved: 15/Aug/16

Status: Resolved
Project: bgpcep
Component/s: BGP
Affects Version/s: Bugzilla Migration
Fix Version/s: Bugzilla Migration

Type: Bug
Reporter: Kevin Wang Assignee: Kevin Wang
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Zip Archive karaf.log.77.zip    
Issue Links:
Blocks
blocks BGPCEP-491 bgp-ls update not being shown in the ... Resolved
blocks BGPCEP-493 Topology freezes or slows down due to... Resolved
External issue ID: 6111

 Description   

Race is observed

2016-06-24 17:55:24,877 | ERROR | lt-dispatcher-36 | DataTreeChangeListenerActor | 149 - org.opendaylight.controller.sal-distributed-datastore - 1.3.0.Beryllium | Error notifying listener org.opendaylight.controller.md.sal.binding.impl.BindingDOMDataTreeChangeListenerAdapter@2cf0c767
java.lang.IllegalStateException: New transaction PingPongTransaction

{delegate=org.opendaylight.controller.cluster.databroker.DOMBrokerReadWriteTransaction@535763c8}

raced with transacion PingPongTransaction

{delegate=org.opendaylight.controller.cluster.databroker.DOMBrokerReadWriteTransaction@f493600}

at org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransactionChain.slowAllocateTransaction(PingPongTransactionChain.java:136)[122:org.opendaylight.controller.sal-broker-impl:1.3.0.Beryllium]
at org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransactionChain.allocateTransaction(PingPongTransactionChain.java:148)[122:org.opendaylight.controller.sal-broker-impl:1.3.0.Beryllium]
at org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransactionChain.newReadWriteTransaction(PingPongTransactionChain.java:328)[122:org.opendaylight.controller.sal-broker-impl:1.3.0.Beryllium]
at org.opendaylight.controller.md.sal.binding.impl.BindingDOMTransactionChainAdapter.newReadWriteTransaction(BindingDOMTransactionChainAdapter.java:62)[124:org.opendaylight.controller.sal-binding-broker-impl:1.3.0.Beryllium]
at org.opendaylight.bgpcep.bgp.topology.provider.AbstractTopologyBuilder.onDataTreeChanged(AbstractTopologyBuilder.java:130)[327:org.opendaylight.bgpcep.bgp-topology-provider:0.5.0.Beryllium]
at org.opendaylight.controller.md.sal.binding.impl.BindingDOMDataTreeChangeListenerAdapter.onDataTreeChanged(BindingDOMDataTreeChangeListenerAdapter.java:41)[124:org.opendaylight.controller.sal-binding-broker-impl:1.3.0.Beryllium]
at org.opendaylight.controller.cluster.datastore.DataTreeChangeListenerActor.dataChanged(DataTreeChangeListenerActor.java:53)[149:org.opendaylight.controller.sal-distributed-datastore:1.3.0.Beryllium]
at org.opendaylight.controller.cluster.datastore.DataTreeChangeListenerActor.handleReceive(DataTreeChangeListenerActor.java:37)[149:org.opendaylight.controller.sal-distributed-datastore:1.3.0.Beryllium]
at org.opendaylight.controller.cluster.common.actor.AbstractUntypedActor.onReceive(AbstractUntypedActor.java:34)[145:org.opendaylight.controller.sal-clustering-commons:1.3.0.Beryllium]
at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:167)[134:com.typesafe.akka.actor:2.3.14]
at akka.actor.Actor$class.aroundReceive(Actor.scala:467)[134:com.typesafe.akka.actor:2.3.14]
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97)[134:com.typesafe.akka.actor:2.3.14]
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)[134:com.typesafe.akka.actor:2.3.14]
at akka.actor.ActorCell.invoke(ActorCell.scala:487)[134:com.typesafe.akka.actor:2.3.14]
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)[134:com.typesafe.akka.actor:2.3.14]
at akka.dispatch.Mailbox.run(Mailbox.scala:220)[134:com.typesafe.akka.actor:2.3.14]
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)[134:com.typesafe.akka.actor:2.3.14]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)[131:org.scala-lang.scala-library:2.11.7.v20150622-112736-1fbce4612c]
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)[131:org.scala-lang.scala-library:2.11.7.v20150622-112736-1fbce4612c]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)[131:org.scala-lang.scala-library:2.11.7.v20150622-112736-1fbce4612c]
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)[131:org.scala-lang.scala-library:2.11.7.v20150622-112736-1fbce4612c]



 Comments   
Comment by Milos Fabian [ 27/Jun/16 ]

Could you please add steps to reproduce or at least some more info when this issue happens? Optionally, could you try it with a latest Beryllium release (SR2)?

Thanks,
Milos

Comment by Kevin Wang [ 11/Jul/16 ]

Cannot reproduce for now. Will reopen if more information about this bug is observed.

Comment by Kevin Wang [ 20/Jul/16 ]

The node is trying to sync the bgp-linkstate topology with other nodes, however, the transaction fails due to timeout.

2016-07-19 17:19:59,376 | ERROR | CommitFutures-10 | AbstractTopologyBuilder | 327 - org.opendaylight.bgpcep.bgp-topology-provider - 0.5.0.Beryllium | Topology builder for KeyedInstanceIdentifier

{targetType=interface org.opendaylight.yang.gen.v1.urn.tbd.params.xml.ns.yang.network.topology.rev131021.network.topology.Topology, path=[org.opendaylight.yang.gen.v1.urn.tbd.params.xml.ns.yang.network.topology.rev131021.NetworkTopology, org.opendaylight.yang.gen.v1.urn.tbd.params.xml.ns.yang.network.topology.rev131021.network.topology.Topology[key=TopologyKey [_topologyId=Uri [_value=CBB-linkstate-topology-3]]]]}

failed in transaction DOM-CHAIN-12-333
TransactionCommitFailedException

{message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=akka.pattern.AskTimeoutException: Ask timed out on [ActorSelection[Anchor(akka.tcp://opendaylight-cluster-data@mtsnjv9sdbc12-eth1-0.sdcp2.aic.cip.att.com:2550/), Path(/user/shardmanager-operational/member-2-shard-topology-operational/shard-member-3-chn-25-txn-334-1468948738750#-1973141357)]] after [30000 ms]]]}

at org.opendaylight.controller.md.sal.dom.broker.impl.TransactionCommitFailedExceptionMapper.newWithCause(TransactionCommitFailedExceptionMapper.java:37)[111:org.opendaylight.controller.sal-broker-impl:1.3.0.Beryllium]
at org.opendaylight.controller.md.sal.dom.broker.impl.TransactionCommitFailedExceptionMapper.newWithCause(TransactionCommitFailedExceptionMapper.java:18)[111:org.opendaylight.controller.sal-broker-impl:1.3.0.Beryllium]
at org.opendaylight.yangtools.util.concurrent.ExceptionMapper.apply(ExceptionMapper.java:96)[41:org.opendaylight.yangtools.util:0.8.0.Beryllium]
at org.opendaylight.controller.cluster.datastore.ConcurrentDOMDataBroker.handleException(ConcurrentDOMDataBroker.java:209)[138:org.opendaylight.controller.sal-distributed-datastore:1.3.0.Beryllium]
at org.opendaylight.controller.cluster.datastore.ConcurrentDOMDataBroker.access$000(ConcurrentDOMDataBroker.java:44)[138:org.opendaylight.controller.sal-distributed-datastore:1.3.0.Beryllium]
at org.opendaylight.controller.cluster.datastore.ConcurrentDOMDataBroker$1.onFailure(ConcurrentDOMDataBroker.java:117)[138:org.opendaylight.controller.sal-distributed-datastore:1.3.0.Beryllium]
at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.SettableFuture.setException(SettableFuture.java:68)[38:com.google.guava:18.0.0]
at org.opendaylight.controller.cluster.datastore.SingleCommitCohortProxy$1.onComplete(SingleCommitCohortProxy.java:60)[138:org.opendaylight.controller.sal-distributed-datastore:1.3.0.Beryllium]
at akka.dispatch.OnComplete.internal(Future.scala:247)[123:com.typesafe.akka.actor:2.3.14]
at akka.dispatch.OnComplete.internal(Future.scala:245)[123:com.typesafe.akka.actor:2.3.14]
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:175)[123:com.typesafe.akka.actor:2.3.14]
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:172)[123:com.typesafe.akka.actor:2.3.14]
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)[120:org.scala-lang.scala-library:2.11.7.v20150622-112736-1fbce4612c]
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)[123:com.typesafe.akka.actor:2.3.14]
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)[123:com.typesafe.akka.actor:2.3.14]
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)[123:com.typesafe.akka.actor:2.3.14]
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)[123:com.typesafe.akka.actor:2.3.14]
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)[120:org.scala-lang.scala-library:2.11.7.v20150622-112736-1fbce4612c]
at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)[123:com.typesafe.akka.actor:2.3.14]
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)[123:com.typesafe.akka.actor:2.3.14]
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)[123:com.typesafe.akka.actor:2.3.14]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)[120:org.scala-lang.scala-library:2.11.7.v20150622-112736-1fbce4612c]
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)[120:org.scala-lang.scala-library:2.11.7.v20150622-112736-1fbce4612c]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)[120:org.scala-lang.scala-library:2.11.7.v20150622-112736-1fbce4612c]
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)[120:org.scala-lang.scala-library:2.11.7.v20150622-112736-1fbce4612c]
Caused by: akka.pattern.AskTimeoutException: Ask timed out on [ActorSelection[Anchor(akka.tcp://opendaylight-cluster-data@mtsnjv9sdbc12-eth1-0.sdcp2.aic.cip.att.com:2550/), Path(/user/shardmanager-operational/member-2-shard-topology-operational/shard-member-3-chn-25-txn-334-1468948738750#-1973141357)]] after [30000 ms]
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334)[123:com.typesafe.akka.actor:2.3.14]
at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)[123:com.typesafe.akka.actor:2.3.14]
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:599)[120:org.scala-lang.scala-library:2.11.7.v20150622-112736-1fbce4612c]
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)[120:org.scala-lang.scala-library:2.11.7.v20150622-112736-1fbce4612c]
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:597)[120:org.scala-lang.scala-library:2.11.7.v20150622-112736-1fbce4612c]
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)[123:com.typesafe.akka.actor:2.3.14]
at akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)[123:com.typesafe.akka.actor:2.3.14]
at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)[123:com.typesafe.akka.actor:2.3.14]
at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)[123:com.typesafe.akka.actor:2.3.14]
at java.lang.Thread.run(Thread.java:745)[:1.7.0_76]
2016-07-19 17:19:59,376 | ERROR | CommitFutures-10 | AbstractTopologyBuilder | 327 - org.opendaylight.bgpcep.bgp-topology-provider - 0.5.0.Beryllium | Failed to propagate change by listener org.opendaylight.bgpcep.bgp.topology.provider.LinkstateTopologyBuilder@4907aa42
2016-07-19 17:19:59,393 | WARN | lt-dispatcher-27 | AbstractTopologyBuilder | 327 - org.opendaylight.bgpcep.bgp-topology-provider - 0.5.0.Beryllium | Data change org.opendaylight.controller.md.sal.binding.impl.LazyDataTreeModification@64d3333d was not completely propagated to listener org.opendaylight.bgpcep.bgp.topology.provider.LinkstateTopologyBuilder@4907aa42, aborting
java.lang.IllegalStateException: Previous transaction member-3-chn-25-txn-335-1468948769013 is not ready yet
at org.opendaylight.controller.cluster.datastore.TransactionChainProxy$Allocated.checkReady(TransactionChainProxy.java:85)[138:org.opendaylight.controller.sal-distributed-datastore:1.3.0.Beryllium]
at org.opendaylight.controller.cluster.datastore.TransactionChainProxy.allocateWriteTransaction(TransactionChainProxy.java:192)[138:org.opendaylight.controller.sal-distributed-datastore:1.3.0.Beryllium]
at org.opendaylight.controller.cluster.datastore.TransactionChainProxy.newReadWriteTransaction(TransactionChainProxy.java:173)[138:org.opendaylight.controller.sal-distributed-datastore:1.3.0.Beryllium]
at org.opendaylight.controller.cluster.databroker.DOMBrokerReadWriteTransaction.createTransaction(DOMBrokerReadWriteTransaction.java:49)[138:org.opendaylight.controller.sal-distributed-datastore:1.3.0.Beryllium]
at org.opendaylight.controller.cluster.databroker.DOMBrokerReadWriteTransaction.createTransaction(DOMBrokerReadWriteTransaction.java:22)[138:org.opendaylight.controller.sal-distributed-datastore:1.3.0.Beryllium]
at org.opendaylight.controller.cluster.databroker.AbstractDOMBrokerTransaction.getSubtransaction(AbstractDOMBrokerTransaction.java:57)[138:org.opendaylight.controller.sal-distributed-datastore:1.3.0.Beryllium]
at org.opendaylight.controller.cluster.databroker.AbstractDOMBrokerWriteTransaction.delete(AbstractDOMBrokerWriteTransaction.java:88)[138:org.opendaylight.controller.sal-distributed-datastore:1.3.0.Beryllium]
at org.opendaylight.controller.cluster.databroker.AbstractDOMBrokerWriteTransaction.delete(AbstractDOMBrokerWriteTransaction.java:35)[138:org.opendaylight.controller.sal-distributed-datastore:1.3.0.Beryllium]
at org.opendaylight.controller.md.sal.dom.spi.ForwardingDOMDataReadWriteTransaction.delete(ForwardingDOMDataReadWriteTransaction.java:64)[110:org.opendaylight.controller.sal-core-spi:1.3.0.Beryllium]
at org.opendaylight.controller.md.sal.dom.spi.ForwardingDOMDataReadWriteTransaction.delete(ForwardingDOMDataReadWriteTransaction.java:28)[110:org.opendaylight.controller.sal-core-spi:1.3.0.Beryllium]
at org.opendaylight.controller.md.sal.binding.impl.AbstractWriteTransaction.doDelete(AbstractWriteTransaction.java:130)[113:org.opendaylight.controller.sal-binding-broker-impl:1.3.0.Beryllium]
at org.opendaylight.controller.md.sal.binding.impl.BindingDOMWriteTransactionAdapter.delete(BindingDOMWriteTransactionAdapter.java:73)[113:org.opendaylight.controller.sal-binding-broker-impl:1.3.0.Beryllium]
at org.opendaylight.bgpcep.bgp.topology.provider.LinkstateTopologyBuilder$NodeHolder.syncState(LinkstateTopologyBuilder.java:187)[327:org.opendaylight.bgpcep.bgp-topology-provider:0.5.0.Beryllium]
at org.opendaylight.bgpcep.bgp.topology.provider.LinkstateTopologyBuilder$NodeHolder.access$600(LinkstateTopologyBuilder.java:160)[327:org.opendaylight.bgpcep.bgp-topology-provider:0.5.0.Beryllium]
at org.opendaylight.bgpcep.bgp.topology.provider.LinkstateTopologyBuilder.putNode(LinkstateTopologyBuilder.java:478)[327:org.opendaylight.bgpcep.bgp-topology-provider:0.5.0.Beryllium]
at org.opendaylight.bgpcep.bgp.topology.provider.LinkstateTopologyBuilder.createPrefix(LinkstateTopologyBuilder.java:825)[327:org.opendaylight.bgpcep.bgp-topology-provider:0.5.0.Beryllium]
at org.opendaylight.bgpcep.bgp.topology.provider.LinkstateTopologyBuilder.createObject(LinkstateTopologyBuilder.java:853)[327:org.opendaylight.bgpcep.bgp-topology-provider:0.5.0.Beryllium]
at org.opendaylight.bgpcep.bgp.topology.provider.LinkstateTopologyBuilder.createObject(LinkstateTopologyBuilder.java:121)[327:org.opendaylight.bgpcep.bgp-topology-provider:0.5.0.Beryllium]
at org.opendaylight.bgpcep.bgp.topology.provider.AbstractTopologyBuilder.routeChanged(AbstractTopologyBuilder.java:165)[327:org.opendaylight.bgpcep.bgp-topology-provider:0.5.0.Beryllium]
at org.opendaylight.bgpcep.bgp.topology.provider.AbstractTopologyBuilder.onDataTreeChanged(AbstractTopologyBuilder.java:134)[327:org.opendaylight.bgpcep.bgp-topology-provider:0.5.0.Beryllium]
at org.opendaylight.controller.md.sal.binding.impl.BindingDOMDataTreeChangeListenerAdapter.onDataTreeChanged(BindingDOMDataTreeChangeListenerAdapter.java:41)[113:org.opendaylight.controller.sal-binding-broker-impl:1.3.0.Beryllium]
at org.opendaylight.controller.cluster.datastore.DataTreeChangeListenerActor.dataChanged(DataTreeChangeListenerActor.java:53)[138:org.opendaylight.controller.sal-distributed-datastore:1.3.0.Beryllium]
at org.opendaylight.controller.cluster.datastore.DataTreeChangeListenerActor.handleReceive(DataTreeChangeListenerActor.java:37)[138:org.opendaylight.controller.sal-distributed-datastore:1.3.0.Beryllium]
at org.opendaylight.controller.cluster.common.actor.AbstractUntypedActor.onReceive(AbstractUntypedActor.java:34)[134:org.opendaylight.controller.sal-clustering-commons:1.3.0.Beryllium]
at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:167)[123:com.typesafe.akka.actor:2.3.14]
at akka.actor.Actor$class.aroundReceive(Actor.scala:467)[123:com.typesafe.akka.actor:2.3.14]
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97)[123:com.typesafe.akka.actor:2.3.14]
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)[123:com.typesafe.akka.actor:2.3.14]
at akka.actor.ActorCell.invoke(ActorCell.scala:487)[123:com.typesafe.akka.actor:2.3.14]
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)[123:com.typesafe.akka.actor:2.3.14]
at akka.dispatch.Mailbox.run(Mailbox.scala:220)[123:com.typesafe.akka.actor:2.3.14]
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)[123:com.typesafe.akka.actor:2.3.14]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)[120:org.scala-lang.scala-library:2.11.7.v20150622-112736-1fbce4612c]
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)[120:org.scala-lang.scala-library:2.11.7.v20150622-112736-1fbce4612c]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)[120:org.scala-lang.scala-library:2.11.7.v20150622-112736-1fbce4612c]
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)[120:org.scala-lang.scala-library:2.11.7.v20150622-112736-1fbce4612c]

Then in AbstractTopologyBuilder, it tries to cancel the PingPongTransaction, while the cancel fails as PingPongTransactionChain implements it as "Unsupported".

2016-07-19 17:19:59,394 | ERROR | lt-dispatcher-27 | DataTreeChangeListenerActor | 138 - org.opendaylight.controller.sal-distributed-datastore - 1.3.0.Beryllium | Error notifying listener org.opendaylight.controller.md.sal.binding.impl.BindingDOMDataTreeChangeListenerAdapter@2b6c9375
java.lang.UnsupportedOperationException: Transaction cancellation is not supported
at org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransactionChain$4.cancel(PingPongTransactionChain.java:349)[111:org.opendaylight.controller.sal-broker-impl:1.3.0.Beryllium]
at org.opendaylight.controller.md.sal.binding.impl.AbstractWriteTransaction.doCancel(AbstractWriteTransaction.java:138)[113:org.opendaylight.controller.sal-binding-broker-impl:1.3.0.Beryllium]
at org.opendaylight.controller.md.sal.binding.impl.BindingDOMWriteTransactionAdapter.cancel(BindingDOMWriteTransactionAdapter.java:88)[113:org.opendaylight.controller.sal-binding-broker-impl:1.3.0.Beryllium]
at org.opendaylight.bgpcep.bgp.topology.provider.AbstractTopologyBuilder.onDataTreeChanged(AbstractTopologyBuilder.java:137)[327:org.opendaylight.bgpcep.bgp-topology-provider:0.5.0.Beryllium]
at org.opendaylight.controller.md.sal.binding.impl.BindingDOMDataTreeChangeListenerAdapter.onDataTreeChanged(BindingDOMDataTreeChangeListenerAdapter.java:41)[113:org.opendaylight.controller.sal-binding-broker-impl:1.3.0.Beryllium]
at org.opendaylight.controller.cluster.datastore.DataTreeChangeListenerActor.dataChanged(DataTreeChangeListenerActor.java:53)[138:org.opendaylight.controller.sal-distributed-datastore:1.3.0.Beryllium]
at org.opendaylight.controller.cluster.datastore.DataTreeChangeListenerActor.handleReceive(DataTreeChangeListenerActor.java:37)[138:org.opendaylight.controller.sal-distributed-datastore:1.3.0.Beryllium]
at org.opendaylight.controller.cluster.common.actor.AbstractUntypedActor.onReceive(AbstractUntypedActor.java:34)[134:org.opendaylight.controller.sal-clustering-commons:1.3.0.Beryllium]
at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:167)[123:com.typesafe.akka.actor:2.3.14]
at akka.actor.Actor$class.aroundReceive(Actor.scala:467)[123:com.typesafe.akka.actor:2.3.14]
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97)[123:com.typesafe.akka.actor:2.3.14]
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)[123:com.typesafe.akka.actor:2.3.14]
at akka.actor.ActorCell.invoke(ActorCell.scala:487)[123:com.typesafe.akka.actor:2.3.14]
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)[123:com.typesafe.akka.actor:2.3.14]
at akka.dispatch.Mailbox.run(Mailbox.scala:220)[123:com.typesafe.akka.actor:2.3.14]
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)[123:com.typesafe.akka.actor:2.3.14]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)[120:org.scala-lang.scala-library:2.11.7.v20150622-112736-1fbce4612c]
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)[120:org.scala-lang.scala-library:2.11.7.v20150622-112736-1fbce4612c]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)[120:org.scala-lang.scala-library:2.11.7.v20150622-112736-1fbce4612c]
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)[120:org.scala-lang.scala-library:2.11.7.v20150622-112736-1fbce4612c]

Eventually the transaction is locked and causing new transaction fail to be created.

2016-07-19 17:19:59,395 | ERROR | lt-dispatcher-27 | DataTreeChangeListenerActor | 138 - org.opendaylight.controller.sal-distributed-datastore - 1.3.0.Beryllium | Error notifying listener org.opendaylight.controller.md.sal.binding.impl.BindingDOMDataTreeChangeListenerAdapter@2b6c9375
java.lang.IllegalStateException: New transaction PingPongTransaction

{delegate=org.opendaylight.controller.cluster.databroker.DOMBrokerReadWriteTransaction@14239319}

raced with transacion PingPongTransaction

{delegate=org.opendaylight.controller.cluster.databroker.DOMBrokerReadWriteTransaction@2b8e58a3}

at org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransactionChain.slowAllocateTransaction(PingPongTransactionChain.java:136)[111:org.opendaylight.controller.sal-broker-impl:1.3.0.Beryllium]
at org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransactionChain.allocateTransaction(PingPongTransactionChain.java:148)[111:org.opendaylight.controller.sal-broker-impl:1.3.0.Beryllium]
at org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransactionChain.newReadWriteTransaction(PingPongTransactionChain.java:328)[111:org.opendaylight.controller.sal-broker-impl:1.3.0.Beryllium]
at org.opendaylight.controller.md.sal.binding.impl.BindingDOMTransactionChainAdapter.newReadWriteTransaction(BindingDOMTransactionChainAdapter.java:62)[113:org.opendaylight.controller.sal-binding-broker-impl:1.3.0.Beryllium]
at org.opendaylight.bgpcep.bgp.topology.provider.AbstractTopologyBuilder.onDataTreeChanged(AbstractTopologyBuilder.java:130)[327:org.opendaylight.bgpcep.bgp-topology-provider:0.5.0.Beryllium]
at org.opendaylight.controller.md.sal.binding.impl.BindingDOMDataTreeChangeListenerAdapter.onDataTreeChanged(BindingDOMDataTreeChangeListenerAdapter.java:41)[113:org.opendaylight.controller.sal-binding-broker-impl:1.3.0.Beryllium]
at org.opendaylight.controller.cluster.datastore.DataTreeChangeListenerActor.dataChanged(DataTreeChangeListenerActor.java:53)[138:org.opendaylight.controller.sal-distributed-datastore:1.3.0.Beryllium]
at org.opendaylight.controller.cluster.datastore.DataTreeChangeListenerActor.handleReceive(DataTreeChangeListenerActor.java:37)[138:org.opendaylight.controller.sal-distributed-datastore:1.3.0.Beryllium]
at org.opendaylight.controller.cluster.common.actor.AbstractUntypedActor.onReceive(AbstractUntypedActor.java:34)[134:org.opendaylight.controller.sal-clustering-commons:1.3.0.Beryllium]
at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:167)[123:com.typesafe.akka.actor:2.3.14]
at akka.actor.Actor$class.aroundReceive(Actor.scala:467)[123:com.typesafe.akka.actor:2.3.14]
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97)[123:com.typesafe.akka.actor:2.3.14]
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)[123:com.typesafe.akka.actor:2.3.14]
at akka.actor.ActorCell.invoke(ActorCell.scala:487)[123:com.typesafe.akka.actor:2.3.14]
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)[123:com.typesafe.akka.actor:2.3.14]
at akka.dispatch.Mailbox.run(Mailbox.scala:220)[123:com.typesafe.akka.actor:2.3.14]
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)[123:com.typesafe.akka.actor:2.3.14]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)[120:org.scala-lang.scala-library:2.11.7.v20150622-112736-1fbce4612c]
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)[120:org.scala-lang.scala-library:2.11.7.v20150622-112736-1fbce4612c]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)[120:org.scala-lang.scala-library:2.11.7.v20150622-112736-1fbce4612c]
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)[120:org.scala-lang.scala-library:2.11.7.v20150622-112736-1fbce4612c]

Comment by Kevin Wang [ 20/Jul/16 ]

Attachment karaf.log.77.zip has been added with description: karaf log

Comment by Claudio David Gasparini [ 21/Jul/16 ]

Hi Kevin, regarding your last comment not sure if the attachment correspond to this bug or not..
If it something new, please fill a new bug, notice please, that when a new bug is opened we expect further information like steps to reproduce the bug, what were you testing, version of controller, if you were working on a single node, clustering, 3 node clustering, etc... As more information better are the chance to trace the issue quickly.

Regards

Comment by Kevin Wang [ 21/Jul/16 ]

Hi Claudio,

Sorry for the unclear description. The last attachment (karaf log) is for this bug. The problem happens in a 3-node cluster. It happened on Beryllium release. But I suppose it will also happen on SR2 or Boron as the related code hasn't changed.

The following is my observation:

1. ODL is trying to sync BGP-LS topology with other nodes.

2. ODL tried to get member-2 shard transaction:
2016-07-19 17:19:45,617 | WARN | lt-dispatcher-27 | OperationLimiter | 138 - org.opendaylight.controller.sal-distributed-datastore - 1.3.0.Beryllium | Failed to acquire operation permit for transaction member-3-chn-25-txn-335-1468948769013
2016-07-19 17:19:50,618 | WARN | lt-dispatcher-27 | OperationLimiter | 138 - org.opendaylight.controller.sal-distributed-datastore - 1.3.0.Beryllium | Failed to acquire operation permit for transaction member-3-chn-25-txn-335-1468948769013
2016-07-19 17:19:55,648 | WARN | lt-dispatcher-27 | OperationLimiter | 138 - org.opendaylight.controller.sal-distributed-datastore - 1.3.0.Beryllium | Failed to acquire operation permit for transaction member-3-chn-25-txn-335-1468948769013
2016-07-19 17:19:59,027 | ERROR | ult-dispatcher-4 | TransactionChainProxy | 138 - org.opendaylight.controller.sal-distributed-datastore - 1.3.0.Beryllium | Tx: member-3-chn-25-txn-335-1468948769013 - ready future failed for previous Tx member-3-chn-25-txn-335-1468948769013
2016-07-19 17:19:59,027 | WARN | ult-dispatcher-2 | ConcurrentDOMDataBroker | 138 - org.opendaylight.controller.sal-distributed-datastore - 1.3.0.Beryllium | Tx: DOM-CHAIN-12-333 Error during phase CAN_COMMIT, starting Abort
akka.pattern.AskTimeoutException: Ask timed out on [ActorSelection[Anchor(akka.tcp://opendaylight-cluster-data@mtsnjv9sdbc12-eth1-0.sdcp2.aic.cip.att.com:2550/), Path(/user/shardmanager-operational/member-2-shard-topology-operational/shard-member-3-chn-25-txn-334-1468948738750#-1973141357)]] after [30000 ms]
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334)[123:com.typesafe.akka.actor:2.3.14]
at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)[123:com.typesafe.akka.actor:2.3.14]
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:599)[120:org.scala-lang.scala-library:2.11.7.v20150622-112736-1fbce4612c]
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)[120:org.scala-lang.scala-library:2.11.7.v20150622-112736-1fbce4612c]
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:597)[120:org.scala-lang.scala-library:2.11.7.v20150622-112736-1fbce4612c]
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)[123:com.typesafe.akka.actor:2.3.14]
at akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)[123:com.typesafe.akka.actor:2.3.14]
at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)[123:com.typesafe.akka.actor:2.3.14]
at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)[123:com.typesafe.akka.actor:2.3.14]
at java.lang.Thread.run(Thread.java:745)[:1.7.0_76]

3. The shard keeps timing out initially (in the log, it times out on member-2)

4. Eventually, the sync transaction fails due to the previous transaction timed out:
2016-07-19 17:19:59,393 | WARN | lt-dispatcher-27 | AbstractTopologyBuilder | 327 - org.opendaylight.bgpcep.bgp-topology-provider - 0.5.0.Beryllium | Data change org.opendaylight.controller.md.sal.binding.impl.LazyDataTreeModification@64d3333d was not completely propagated to listener org.opendaylight.bgpcep.bgp.topology.provider.LinkstateTopologyBuilder@4907aa42, aborting
java.lang.IllegalStateException: Previous transaction member-3-chn-25-txn-335-1468948769013 is not ready yet

5. Then the BGP topology provider tried to cancel the transaction, however, the cancel() is not implemented for PingPongTransactionChain (this could either be a PingPongTransactionChain bug, that the cancel() should be implemented properly, or a AbstractTopologyBuilder bug, that it should not invoke the cancel() method when a transaction fails):
2016-07-19 17:19:59,394 | ERROR | lt-dispatcher-27 | DataTreeChangeListenerActor | 138 - org.opendaylight.controller.sal-distributed-datastore - 1.3.0.Beryllium | Error notifying listener org.opendaylight.controller.md.sal.binding.impl.BindingDOMDataTreeChangeListenerAdapter@2b6c9375
java.lang.UnsupportedOperationException: Transaction cancellation is not supported

6. Consequently, all the following new PingPongTransaction cannot be created. Because the previous PingPongTransaction was not properly canceled, the PingPongTransactionChain was never unlocked. All the new PingPongTransaction was said "raced" with this problematic PingPongTransaction. Engineering believes that the data change on BGP-LS will never happen after this.

In summary, the AbstractTopologyBuilder, it is trying to cancel a PingPongTransactionChain, which is not supported. That leads to the problem.

Please let me know if you have any other thought.

Comment by Kevin Wang [ 21/Jul/16 ]

Confirmed with Robert,

"PingPongDataBroker does not support cancelling transaction. It really
cannot, as it packs multiple transactions into one.

If a submitted transaction fails, the entire TransactionChain becomes
unusable (all subsequent transactions will fail). When that happens, the
TransactionChain needs to be close()d and a new one needs to be
instantiated."

Comment by Kevin Wang [ 27/Jul/16 ]

https://git.opendaylight.org/gerrit/#/c/42278/

Comment by Robert Varga [ 03/Aug/16 ]

PingPong side of fix: https://git.opendaylight.org/gerrit/43058

Comment by Milos Fabian [ 15/Aug/16 ]

stable/boron: https://git.opendaylight.org/gerrit/#/c/43928/

Generated at Wed Feb 07 19:13:12 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.