[CONTROLLER-1623] Error creating READ_WRITE transaction on shard topology Created: 24/Mar/17  Updated: 25/Jul/23  Resolved: 14/Nov/18

Status: Verified
Project: controller
Component/s: clustering
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Claudio David Gasparini Assignee: Claudio David Gasparini
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 8075

 Description   

https://logs.opendaylight.org/releng/jenkins092/bgpcep-csit-3node-periodic-bgpclustering-only-carbon/207/archives/odl3_karaf.log.gz

2017-03-15 07:20:23,262 | INFO | lt-dispatcher-21 | Shard | 226 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.SNAPSHOT | Peer address for peer member-2-shard-toaster-operational set to akka.tcp://opendaylight-cluster-data@10.29.12.203:2550/user/shardmanager-operational/member-2-shard-toaster-operational
2017-03-15 07:20:23,298 | WARN | lt-dispatcher-39 | ConcurrentDOMDataBroker | 233 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Tx: DOM-CHAIN-4-526 Error during phase CAN_COMMIT, starting Abort
java.lang.Exception: Error creating READ_WRITE transaction on shard topology
at org.opendaylight.controller.cluster.datastore.RemoteTransactionContextSupport.createTransactionContext(RemoteTransactionContextSupport.java:217)
at org.opendaylight.controller.cluster.datastore.RemoteTransactionContextSupport.onCreateTransactionComplete(RemoteTransactionContextSupport.java:195)
at org.opendaylight.controller.cluster.datastore.RemoteTransactionContextSupport.access$000(RemoteTransactionContextSupport.java:39)
at org.opendaylight.controller.cluster.datastore.RemoteTransactionContextSupport$1.onComplete(RemoteTransactionContextSupport.java:135)
at akka.dispatch.OnComplete.internal(Future.scala:258)
at akka.dispatch.OnComplete.internal(Future.scala:256)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.IllegalStateException: Transaction ReadWriteShardDataTreeTransaction{id=member-3-datastore-operational-fe-0-chn-5-txn-526-0, closed=false, snapshot=MutableDataTree [modification=NodeModification [identifier=(urn:ietf:params:xml:ns:netconf:base:1.0)data, modificationType=NONE, childModification={}]]} is open
at com.google.common.base.Preconditions.checkState(Preconditions.java:197)
at org.opendaylight.controller.cluster.datastore.ShardDataTreeTransactionChain.getSnapshot(ShardDataTreeTransactionChain.java:43)
at org.opendaylight.controller.cluster.datastore.ShardDataTreeTransactionChain.newReadWriteTransaction(ShardDataTreeTransactionChain.java:62)
at org.opendaylight.controller.cluster.datastore.ShardDataTree.newReadWriteTransaction(ShardDataTree.java:492)
at org.opendaylight.controller.cluster.datastore.ShardTransactionActorFactory.newShardTransaction(ShardTransactionActorFactory.java:71)
at org.opendaylight.controller.cluster.datastore.Shard.createTransaction(Shard.java:659)
at org.opendaylight.controller.cluster.datastore.Shard.createTransaction(Shard.java:647)
at org.opendaylight.controller.cluster.datastore.Shard.handleCreateTransaction(Shard.java:624)
at org.opendaylight.controller.cluster.datastore.Shard.handleNonRaftCommand(Shard.java:290)
at org.opendaylight.controller.cluster.raft.RaftActor.handleCommand(RaftActor.java:268)
at org.opendaylight.controller.cluster.common.actor.AbstractUntypedPersistentActor.onReceiveCommand(AbstractUntypedPersistentActor.java:31)
at akka.persistence.UntypedPersistentActor.onReceive(PersistentActor.scala:170)
at org.opendaylight.controller.cluster.common.actor.MeteringBehavior.apply(MeteringBehavior.java:104)
at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:544)
at akka.actor.Actor$class.aroundReceive(Actor.scala:497)
at akka.persistence.UntypedPersistentActor.akka$persistence$Eventsourced$$super$aroundReceive(PersistentActor.scala:168)
at akka.persistence.Eventsourced$$anon$1.stateReceive(Eventsourced.scala:664)
at akka.persistence.Eventsourced$class.aroundReceive(Eventsourced.scala:183)
at akka.persistence.UntypedPersistentActor.aroundReceive(PersistentActor.scala:168)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.



 Comments   
Comment by Robert Varga [ 27/Mar/17 ]

Need more info on this: can you provide logs with DEBUG on org.opendaylight.controller.cluster.datastore.ShardDataTreeTransactionChain?

Comment by Claudio David Gasparini [ 27/Mar/17 ]

this are logs from today

https://logs.opendaylight.org/sandbox/jenkins091/bgpcep-csit-3node-periodic-bgpclustering-only-carbon/6/archives/odl1_karaf.log.gz

running the same test but using patch
https://git.opendaylight.org/gerrit/#/c/53810/
seems to fix/hide the bug

https://logs.opendaylight.org/sandbox/jenkins091/bgpcep-csit-3node-periodic-bgpclustering-only-carbon/2/archives/odl1_karaf.log.gz

I tried with 1 and 100K prefixes but the issue is not reproduced, only with the 1M.
Tomorrow I' ll try the test but with required debug log, but not sure I will achieve it, there will be too much logs...

Comment by Claudio David Gasparini [ 28/Mar/17 ]

requested logs

https://logs.opendaylight.org/sandbox/jenkins091/bgpcep-csit-3node-periodic-bgpclustering-only-carbon/13/archives/

Comment by Robert Varga [ 18/Apr/17 ]

This looks like some trouble was encountered by odl2 just around the time that transaction was allocated:

ODL1:
2017-03-28 11:19:14,044 | DEBUG | lt-dispatcher-23 | ShardDataTreeTransactionChain | 233 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Allocated read-write transaction member-2-datastore-operational-fe-0-chn-5-txn-585-0 snapshot org.opendaylight.yangtools.yang.data.api.schema.tree.spi.Version@274968e9
2017-03-28 11:19:14,051 | WARN | ult-dispatcher-4 | ClusterCoreDaemon | 201 - com.typesafe.akka.slf4j - 2.4.17 | Cluster Node [akka.tcp://opendaylight-cluster-data@10.29.13.68:2550] - Marking node(s) as UNREACHABLE [Member(address = akka.tcp://opendaylight-cluster-data@10.29.12.39:2550, status = Up), Member(address = akka.tcp://opendaylight-cluster-data@10.29.13.13:2550, status = Up)]. Node roles [member-1]
2017-03-28 11:19:14,051 | INFO | ult-dispatcher-4 | ShardManager | 233 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Received UnreachableMember: memberName MemberName

{name=member-2}, address: akka.tcp://opendaylight-cluster-data@10.29.12.39:2550
2017-03-28 11:19:14,051 | INFO | lt-dispatcher-46 | ShardManager | 233 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Received UnreachableMember: memberName MemberName{name=member-2}

, address: akka.tcp://opendaylight-cluster-data@10.29.12.39:2550

ODL2:
2017-03-28 11:19:10,166 | WARN | on-dispatcher-54 | OperationLimiter | 233 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Failed to acquire operation permit for transaction member-2-datastore-operational-fe-0-chn-5-txn-585-0
2017-03-28 11:19:12,335 | WARN | lt-dispatcher-85 | ClusterCoreDaemon | 201 - com.typesafe.akka.slf4j - 2.4.17 | Cluster Node [akka.tcp://opendaylight-cluster-data@10.29.12.39:2550] - Marking node(s) as UNREACHABLE [Member(address = akka.tcp://opendaylight-cluster-data@10.29.13.68:2550, status = Up)]
. Node roles [member-2]
2017-03-28 11:19:12,335 | INFO | lt-dispatcher-71 | ShardManager | 233 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Received UnreachableMember: memberName MemberName

{name=member-1}, address: akka.tcp://opendaylight-cluster-data@10.29.13.68:2550
2017-03-28 11:19:12,335 | INFO | lt-dispatcher-18 | ShardManager | 233 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Received UnreachableMember: memberName MemberName{name=member-1}

, address: akka.tcp://opendaylight-cluster-data@10.29.13.68:2550
2017-03-28 11:19:12,336 | INFO | lt-dispatcher-71 | EntityOwnershipShard | 226 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.SNAPSHOT | member-2-shard-entity-ownership-operational: onPeerDown: PeerDown [memberName=member-1, peerId=member-1-shard-entity-ownership-operational]
2017-03-28 11:19:14,241 | INFO | lt-dispatcher-22 | EmptyLocalActorRef | 201 - com.typesafe.akka.slf4j - 2.4.17 | Message [org.opendaylight.controller.cluster.datastore.messages.CreateTransactionReply] from Actor[akka.tcp://opendaylight-cluster-data@10.29.13.68:2550/user/shardmanager-operational/membe
r-1-shard-topology-operational#-1491023140] to Actor[akka://opendaylight-cluster-data/temp/$k2b] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
2017-03-28 11:19:14,242 | INFO | lt-dispatcher-22 | kka://opendaylight-cluster-data) | 201 - com.typesafe.akka.slf4j - 2.4.17 | Cluster Node [akka.tcp://opendaylight-cluster-data@10.29.12.39:2550] - Ignoring received gossip status from unreachable [UniqueAddress(akka.tcp://opendaylight-cluster-data@10.29.13.68:2550
,1066496970)]
2017-03-28 11:19:14,264 | WARN | lt-dispatcher-37 | ConcurrentDOMDataBroker | 233 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Tx: DOM-CHAIN-4-585 Error during phase CAN_COMMIT, starting Abort
java.lang.Exception: Error creating READ_WRITE transaction on shard topology
at org.opendaylight.controller.cluster.datastore.RemoteTransactionContextSupport.createTransactionContext(RemoteTransactionContextSupport.java:217)
at org.opendaylight.controller.cluster.datastore.RemoteTransactionContextSupport.onCreateTransactionComplete(RemoteTransactionContextSupport.java:195)
at org.opendaylight.controller.cluster.datastore.RemoteTransactionContextSupport.access$000(RemoteTransactionContextSupport.java:39)
at org.opendaylight.controller.cluster.datastore.RemoteTransactionContextSupport$1.onComplete(RemoteTransactionContextSupport.java:135)
at akka.dispatch.OnComplete.internal(Future.scala:258)
at akka.dispatch.OnComplete.internal(Future.scala:256)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.IllegalStateException: Transaction ReadWriteShardDataTreeTransaction{id=member-2-datastore-operational-fe-0-chn-5-txn-585-0, closed=false, snapshot=MutableDataTree [modification=NodeModification [identifier=(urn:ietf:params:xml:ns:netconf:base:1.0)data, modificationType=NONE, childModification=

{ }]]}

is open

Comment by Robert Varga [ 14/Nov/18 ]

cdgasparini is this bug still present?

Comment by Claudio David Gasparini [ 14/Nov/18 ]

Not longer able to reproduce it.

Generated at Wed Feb 07 19:56:01 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.