[OVSDB-324] operational datastore is not getting updated after the previous transaction failed. Created: 18/Mar/16  Updated: 30/Oct/17  Resolved: 31/Aug/16

Status: Resolved
Project: ovsdb
Component/s: Southbound.hw_vtep
Affects Version/s: unspecified
Fix Version/s: None

Type: Bug
Reporter: suneel verma Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Text File karaf.log    
External issue ID: 5555

 Description   

The following message keeps coming in the plugin logs.
Because of this operational ds is not updated with latest info from the switch.

2016-03-18 11:39:06,279 | WARN | n-invoker-impl-0 | TransactionInvokerImpl | 228 - org.opendaylight.ovsdb.hwvtepsouthbound-impl - 1.2.2.SNAPSHOT | Exception invoking Transaction:
java.lang.IllegalStateException: Previous transaction member-1-chn-9-txn-1605-1458297541317 is not ready yet
at org.opendaylight.controller.cluster.datastore.TransactionChainProxy$Allocated.checkReady(TransactionChainProxy.java:85)[146:org.opendaylight.controller.sal-distributed-datastore:1.3.1.SNAPSHOT]
at org.opendaylight.controller.cluster.datastore.TransactionChainProxy.allocateWriteTransaction(TransactionChainProxy.java:192)[146:org.opendaylight.controller.sal-distributed-datastore:1.3.1.SNAPSHOT]
at org.opendaylight.controller.cluster.datastore.TransactionChainProxy.newReadWriteTransaction(TransactionChainProxy.java:173)[146:org.opendaylight.controller.sal-distributed-datastore:1.3.1.SNAPSHOT]
at org.opendaylight.controller.cluster.databroker.DOMBrokerReadWriteTransaction.createTransaction(DOMBrokerReadWriteTransaction.java:49)[146:org.opendaylight.controller.sal-distributed-datastore:1.3.1.SNAPSHOT]
at org.opendaylight.controller.cluster.databroker.DOMBrokerReadWriteTransaction.createTransaction(DOMBrokerReadWriteTransaction.java:22)[146:org.opendaylight.controller.sal-distributed-datastore:1.3.1.SNAPSHOT]
at org.opendaylight.controller.cluster.databroker.AbstractDOMBrokerTransaction.getSubtransaction(AbstractDOMBrokerTransaction.java:57)[146:org.opendaylight.controller.sal-distributed-datastore:1.3.1.SNAPSHOT]
at org.opendaylight.controller.cluster.databroker.DOMBrokerReadWriteTransaction.read(DOMBrokerReadWriteTransaction.java:37)[146:org.opendaylight.controller.sal-distributed-datastore:1.3.1.SNAPSHOT]
at org.opendaylight.controller.md.sal.binding.impl.AbstractForwardedTransaction.doRead(AbstractForwardedTransaction.java:63)[121:org.opendaylight.controller.sal-binding-broker-impl:1.3.1.SNAPSHOT]
at org.opendaylight.controller.md.sal.binding.impl.BindingDOMReadWriteTransactionAdapter.read(BindingDOMReadWriteTransactionAdapter.java:31)[121:org.opendaylight.controller.sal-binding-broker-impl:1.3.1.SNAPSHOT]
at org.opendaylight.ovsdb.hwvtepsouthbound.HwvtepSouthboundUtil.readNode(HwvtepSouthboundUtil.java:68)[228:org.opendaylight.ovsdb.hwvtepsouthbound-impl:1.2.2.SNAPSHOT]
at org.opendaylight.ovsdb.hwvtepsouthbound.transactions.md.HwvtepManagerUpdateCommand.updateManager(HwvtepManagerUpdateCommand.java:60)[228:org.opendaylight.ovsdb.hwvtepsouthbound-impl:1.2.2.SNAPSHOT]
at org.opendaylight.ovsdb.hwvtepsouthbound.transactions.md.HwvtepManagerUpdateCommand.execute(HwvtepManagerUpdateCommand.java:54)[228:org.opendaylight.ovsdb.hwvtepsouthbound-impl:1.2.2.SNAPSHOT]
at org.opendaylight.ovsdb.hwvtepsouthbound.transactions.md.HwvtepOperationalCommandAggregator.execute(HwvtepOperationalCommandAggregator.java:46)[228:org.opendaylight.ovsdb.hwvtepsouthbound-impl:1.2.2.SNAPSHOT]
at org.opendaylight.ovsdb.hwvtepsouthbound.transactions.md.TransactionInvokerImpl.run(TransactionInvokerImpl.java:88)[228:org.opendaylight.ovsdb.hwvtepsouthbound-impl:1.2.2.SNAPSHOT]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)[:1.7.0_85]
at java.util.concurrent.FutureTask.run(FutureTask.java:262)[:1.7.0_85]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)[:1.7.0_85]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)[:1.7.0_85]
at java.lang.Thread.run(Thread.java:745)[:1.7.0_85]



 Comments   
Comment by Anil Vishnoi [ 17/May/16 ]

Hi Vishal,

Any update on this bug ?

Thanks
Anil

Comment by Vishal Thapar [ 17/May/16 ]

I have a workaround for it already merged in hwvtep plugin. This issue happens when there is a bug in hwvtep plugin code [or could be in OVSDB code too], resulting in exception. TransactionInvoker retries failed transactions but if there is bug it will fail everytime. My workaround catches exceptions. A proper fix could be to catch non-DataStore exceptions in TransactionInvoker and not not retry in that case. Not sure if it is good idea or not.

Comment by Hideyuki Tai [ 24/Aug/16 ]

We've faced a similar issue twice yesterday. It seems to me that the issue halted the thread of the run() method of the TransactionInvokerImpl class of org.opendaylight.ovsdb.southbound.transactions.md package. After the issue happened, the OVSDB Southbound plugin didn't process any update notifications.

That's really critical issue. So I've changed the "Importance" of the bug into "blocker".

We've seen this issue when we tested the legacy NetVirt with 3-ODL-node cluster on the stable/beryllium branch. We've faced this issue twice in three test runs. Therefore, I think it is not a rare event. It would happen often.

Just before the thread was halted, ODL output the following WARN messages. I don't think it shows the direct cause of the issue. However, I can guess something bad happens around ShardDataTreeTransactionChain.

2016-08-23 22:10:57,909 | WARN | lt-dispatcher-15 | ConcurrentDOMDataBroker | 166 - org.opendaylight.controller.sal-distributed-datastore - 1.3.4.SNAPSHOT | Tx: DOM-CHAIN-0-58 Error during phase CAN_COMMIT, starting Abort
java.lang.Exception: Error creating READ_WRITE transaction on shard topology
at org.opendaylight.controller.cluster.datastore.RemoteTransactionContextSupport.createTransactionContext(RemoteTransactionContextSupport.java:222)
at org.opendaylight.controller.cluster.datastore.RemoteTransactionContextSupport.onCreateTransactionComplete(RemoteTransactionContextSupport.java:200)
at org.opendaylight.controller.cluster.datastore.RemoteTransactionContextSupport.access$000(RemoteTransactionContextSupport.java:40)
at org.opendaylight.controller.cluster.datastore.RemoteTransactionContextSupport$1.onComplete(RemoteTransactionContextSupport.java:135)
at akka.dispatch.OnComplete.internal(Future.scala:247)
at akka.dispatch.OnComplete.internal(Future.scala:245)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:175)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:172)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.IllegalStateException: Transaction ReadWriteShardDataTreeTransaction{id=shard-member-2-chn-1-txn-59-1472004651115, closed=false, snapshot=MutableDataTree [modification=NodeModification [identifier=(urn:ietf:params:xml:ns:netconf:base:1.0)data, modificationType=NONE, childModification={}]]} is open
at com.google.common.base.Preconditions.checkState(Preconditions.java:197)
at org.opendaylight.controller.cluster.datastore.ShardDataTreeTransactionChain.getSnapshot(ShardDataTreeTransactionChain.java:37)
at org.opendaylight.controller.cluster.datastore.ShardDataTreeTransactionChain.newReadWriteTransaction(ShardDataTreeTransactionChain.java:54)
at org.opendaylight.controller.cluster.datastore.ShardDataTree.newReadWriteTransaction(ShardDataTree.java:118)
at org.opendaylight.controller.cluster.datastore.ShardTransactionActorFactory.newShardTransaction(ShardTransactionFactory.java:49)
at org.opendaylight.controller.cluster.datastore.Shard.createTypedTransactionActor(Shard.java:560)
at org.opendaylight.controller.cluster.datastore.Shard.createTransaction(Shard.java:592)
at org.opendaylight.controller.cluster.datastore.Shard.createTransaction(Shard.java:571)
at org.opendaylight.controller.cluster.datastore.Shard.handleCreateTransaction(Shard.java:543)
at org.opendaylight.controller.cluster.datastore.Shard.onReceiveCommand(Shard.java:233)
at akka.persistence.UntypedPersistentActor.onReceive(Eventsourced.scala:430)
at org.opendaylight.controller.cluster.common.actor.MeteringBehavior.apply(MeteringBehavior.java:97)
at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:534)
at akka.persistence.Recovery$State$class.process(Recovery.scala:30)
at akka.persistence.ProcessorImpl$$anon$2.process(Processor.scala:103)
at akka.persistence.ProcessorImpl$$anon$2.aroundReceive(Processor.scala:114)
at akka.persistence.Recovery$class.aroundReceive(Recovery.scala:265)
at akka.persistence.UntypedPersistentActor.akka$persistence$Eventsourced$$super$aroundReceive(Eventsourced.scala:428)
at akka.persistence.Eventsourced$$anon$2.doAroundReceive(Eventsourced.scala:82)
at akka.persistence.Eventsourced$$anon$2.aroundReceive(Eventsourced.scala:78)
at akka.persistence.Eventsourced$class.aroundReceive(Eventsourced.scala:369)
at akka.persistence.UntypedPersistentActor.aroundReceive(Eventsourced.scala:428)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
... 5 more

2016-08-23 22:10:57,932 | WARN | n-invoker-impl-0 | SouthboundUtil | 253 - org.opendaylight.ovsdb.southbound-impl - 1.2.5.SNAPSHOT | Read Operational/DS for Node failed! KeyedInstanceIdentifier

{targetType=interface org.opendaylight.yang.gen.v1.urn.tbd.params.xml.ns.yang.network.topology.rev131021.network.topology.topology.Node, path=[org.opendaylight.yang.gen.v1.urn.tbd.params.xml.ns.yang.network.topology.rev131021.NetworkTopology, org.opendaylight.yang.gen.v1.urn.tbd.params.xml.ns.yang.network.topology.rev131021.network.topology.Topology[key=TopologyKey [_topologyId=Uri [_value=ovsdb:1]]], org.opendaylight.yang.gen.v1.urn.tbd.params.xml.ns.yang.network.topology.rev131021.network.topology.topology.Node[key=NodeKey [_nodeId=Uri [_value=ovsdb://uuid/69d5898c-a35e-4639-9df7-7fbda31d5c48]]]]}

ReadFailedException{message=Error executeRead ReadData for path /(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)network-topology/topology/topology[

{(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)topology-id=ovsdb:1}

]/node/node[

{(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)node-id=ovsdb://uuid/69d5898c-a35e-4639-9df7-7fbda31d5c48}

], errorList=[RpcError [message=Error executeRead ReadData for path /(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)network-topology/topology/topology[

{(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)topology-id=ovsdb:1}

]/node/node[

{(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)node-id=ovsdb://uuid/69d5898c-a35e-4639-9df7-7fbda31d5c48}

], severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=java.lang.Exception: Error creating READ_WRITE transaction on shard topology]]}
at org.opendaylight.controller.cluster.datastore.NoOpTransactionContext.executeRead(NoOpTransactionContext.java:71)
at org.opendaylight.controller.cluster.datastore.TransactionProxy$1.invoke(TransactionProxy.java:92)
at org.opendaylight.controller.cluster.datastore.TransactionContextWrapper.executePriorTransactionOperations(TransactionContextWrapper.java:132)
at org.opendaylight.controller.cluster.datastore.AbstractTransactionContextFactory.onFindPrimaryShardFailure(AbstractTransactionContextFactory.java:97)
at org.opendaylight.controller.cluster.datastore.AbstractTransactionContextFactory.access$100(AbstractTransactionContextFactory.java:35)
at org.opendaylight.controller.cluster.datastore.AbstractTransactionContextFactory$1.onComplete(AbstractTransactionContextFactory.java:123)
at org.opendaylight.controller.cluster.datastore.AbstractTransactionContextFactory$1.onComplete(AbstractTransactionContextFactory.java:117)
at akka.dispatch.OnComplete.internal(Future.scala:247)
at akka.dispatch.OnComplete.internal(Future.scala:245)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:175)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:172)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.Exception: Error creating READ_WRITE transaction on shard topology
at org.opendaylight.controller.cluster.datastore.RemoteTransactionContextSupport.createTransactionContext(RemoteTransactionContextSupport.java:222)
at org.opendaylight.controller.cluster.datastore.RemoteTransactionContextSupport.onCreateTransactionComplete(RemoteTransactionContextSupport.java:200)
at org.opendaylight.controller.cluster.datastore.RemoteTransactionContextSupport.access$000(RemoteTransactionContextSupport.java:40)
at org.opendaylight.controller.cluster.datastore.RemoteTransactionContextSupport$1.onComplete(RemoteTransactionContextSupport.java:135)
... 17 more
Caused by: java.lang.IllegalStateException: Transaction ReadWriteShardDataTreeTransaction{id=shard-member-2-chn-1-txn-59-1472004651115, closed=false, snapshot=MutableDataTree [modification=NodeModification [identifier=(urn:ietf:params:xml:ns:netconf:base:1.0)data, modificationType=NONE, childModification={}]]} is open
at com.google.common.base.Preconditions.checkState(Preconditions.java:197)
at org.opendaylight.controller.cluster.datastore.ShardDataTreeTransactionChain.getSnapshot(ShardDataTreeTransactionChain.java:37)
at org.opendaylight.controller.cluster.datastore.ShardDataTreeTransactionChain.newReadWriteTransaction(ShardDataTreeTransactionChain.java:54)
at org.opendaylight.controller.cluster.datastore.ShardDataTree.newReadWriteTransaction(ShardDataTree.java:118)
at org.opendaylight.controller.cluster.datastore.ShardTransactionActorFactory.newShardTransaction(ShardTransactionFactory.java:49)
at org.opendaylight.controller.cluster.datastore.Shard.createTypedTransactionActor(Shard.java:560)
at org.opendaylight.controller.cluster.datastore.Shard.createTransaction(Shard.java:592)
at org.opendaylight.controller.cluster.datastore.Shard.createTransaction(Shard.java:571)
at org.opendaylight.controller.cluster.datastore.Shard.handleCreateTransaction(Shard.java:543)
at org.opendaylight.controller.cluster.datastore.Shard.onReceiveCommand(Shard.java:233)
at akka.persistence.UntypedPersistentActor.onReceive(Eventsourced.scala:430)
at org.opendaylight.controller.cluster.common.actor.MeteringBehavior.apply(MeteringBehavior.java:97)
at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:534)
at akka.persistence.Recovery$State$class.process(Recovery.scala:30)
at akka.persistence.ProcessorImpl$$anon$2.process(Processor.scala:103)
at akka.persistence.ProcessorImpl$$anon$2.aroundReceive(Processor.scala:114)
at akka.persistence.Recovery$class.aroundReceive(Recovery.scala:265)
at akka.persistence.UntypedPersistentActor.akka$persistence$Eventsourced$$super$aroundReceive(Eventsourced.scala:428)
at akka.persistence.Eventsourced$$anon$2.doAroundReceive(Eventsourced.scala:82)
at akka.persistence.Eventsourced$$anon$2.aroundReceive(Eventsourced.scala:78)
at akka.persistence.Eventsourced$class.aroundReceive(Eventsourced.scala:369)
at akka.persistence.UntypedPersistentActor.aroundReceive(Eventsourced.scala:428)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
... 5 more

Comment by Hideyuki Tai [ 24/Aug/16 ]

Attachment karaf.log has been added with description: Log messages when we faced the issue that the run() thread was halted

Comment by Hideyuki Tai [ 24/Aug/16 ]

We faced the issue again. And, we've observed that IllegalStateException happens at calling the ReadWriteTransaction.submit() in the run method of the TransactionInvokerImpl. The reason for the IllegalStateException was that the submit() method was called when the state of the DOMBrokerTransactionChain was FAILED. The reason that the chain was at FAILED state was that a previous calling to the submit() on the chain failed. As of now, I'm not sure why a previous calling to the submit() was failed.

What I know is that just before the IllegalStateException happened, ReadWriteTransaction.read() failed during the phase CAN_COMMIT. I'm not sure even why the read() method needs to enter "CAN_COMMIT", that's a little bit weird to me, since I think reading data itself doesn't require any commit.

For more details, please see the attached karaf.log.

To know what kind of Exception happens in the run method of the TransactionInvoker, in our local environment, we've changed the condition of the catch block in the line 98 of the TransactionInvokerImpl.java as follows to catch any Exception.

[southbound/southbound-impl/src/main/java/org/opendaylight/ovsdb/southbound/transactions/md/TransactionInvokerImpl.java]

Current: } catch (InterruptedException e)

{ Changed: }

catch (Exception e) {

The attached karaf.log was the log file when we observed that the run() thread was halted with the above code change.

Comment by Anil Vishnoi [ 30/Aug/16 ]

stable/boron : https://git.opendaylight.org/gerrit/#/c/44806/
master : https://git.opendaylight.org/gerrit/#/c/44090/2

Comment by Hideyuki Tai [ 31/Aug/16 ]

I think we can close this bug. I mean I think it's ok to change the status to RESOLVED.

The current status is we've already merged the change for this bug into the master and stable/boron branches. And, we've not seen this issue after that.

Generated at Wed Feb 07 20:36:06 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.