[NETVIRT-1260] OLFE: Conflicting modification for path... interfaces/interface/interface... name=ed2e5068-31bd-4bcd-9479-0860a9eba76e Created: 15/May/18 Updated: 19/Nov/19 Resolved: 03/Oct/18 |
|
| Status: | Resolved |
| Project: | netvirt |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Medium |
| Reporter: | Sam Hague | Assignee: | Abhinav Gupta |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | csit:exception | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
2018-05-11T17:06:10,067 | WARN | opendaylight-cluster-data-shard-dispatcher-32 | ShardDataTree | 246 - org.opendaylight.controller.sal-distributed-datastore - 1.8.0.SNAPSHOT | member-1-shard-default-config: Store Tx member-1-datastore-config-fe-0-txn-8784-0: Conflicting modification for path /(urn:ietf:params:xml:ns:yang:ietf-interfaces?revision=2014-05-08)interfaces/interface/interface[{(urn:ietf:params:xml:ns:yang:ietf-interfaces?revision=2014-05-08)name=ed2e5068-31bd-4bcd-9479-0860a9eba76e}].
2018-05-11T17:06:10,083 | INFO | ForkJoinPool-1-worker-2 | InterfaceAddWorkerOnElanInterface | 367 - org.opendaylight.netvirt.elanmanager-impl - 0.7.0.SNAPSHOT | Handling elan interface bc5e0034-f81a-47ff-904f-49376d4312e7 add for elan 703b8c5f-3d4e-4ed2-b628-8860ba3de59e
2018-05-11T17:06:10,070 | ERROR | opendaylight-cluster-data-akka.actor.default-dispatcher-3 | LocalThreePhaseCommitCohort | 246 - org.opendaylight.controller.sal-distributed-datastore - 1.8.0.SNAPSHOT | Failed to prepare transaction member-1-datastore-config-fe-0-txn-8784-0 on backend
org.opendaylight.mdsal.common.api.OptimisticLockFailedException: Optimistic lock failed.
at org.opendaylight.controller.cluster.datastore.ShardDataTree.lambda$processNextPendingTransaction$0(ShardDataTree.java:740) ~[246:org.opendaylight.controller.sal-distributed-datastore:1.8.0.SNAPSHOT]
at org.opendaylight.controller.cluster.datastore.ShardDataTree.processNextPending(ShardDataTree.java:778) [246:org.opendaylight.controller.sal-distributed-datastore:1.8.0.SNAPSHOT]
at org.opendaylight.controller.cluster.datastore.ShardDataTree.processNextPendingTransaction(ShardDataTree.java:725) [246:org.opendaylight.controller.sal-distributed-datastore:1.8.0.SNAPSHOT]
at org.opendaylight.controller.cluster.datastore.ShardDataTree.startCanCommit(ShardDataTree.java:808) [246:org.opendaylight.controller.sal-distributed-datastore:1.8.0.SNAPSHOT]
at org.opendaylight.controller.cluster.datastore.SimpleShardDataTreeCohort.canCommit(SimpleShardDataTreeCohort.java:84) [246:org.opendaylight.controller.sal-distributed-datastore:1.8.0.SNAPSHOT]
at org.opendaylight.controller.cluster.datastore.CohortEntry.canCommit(CohortEntry.java:97) [246:org.opendaylight.controller.sal-distributed-datastore:1.8.0.SNAPSHOT]
at org.opendaylight.controller.cluster.datastore.ShardCommitCoordinator.handleCanCommit(ShardCommitCoordinator.java:236) [246:org.opendaylight.controller.sal-distributed-datastore:1.8.0.SNAPSHOT]
at org.opendaylight.controller.cluster.datastore.ShardCommitCoordinator.handleReadyLocalTransaction(ShardCommitCoordinator.java:200) [246:org.opendaylight.controller.sal-distributed-datastore:1.8.0.SNAPSHOT]
at org.opendaylight.controller.cluster.datastore.Shard.handleReadyLocalTransaction(Shard.java:731) [246:org.opendaylight.controller.sal-distributed-datastore:1.8.0.SNAPSHOT]
at org.opendaylight.controller.cluster.datastore.Shard.handleNonRaftCommand(Shard.java:333) [246:org.opendaylight.controller.sal-distributed-datastore:1.8.0.SNAPSHOT]
at org.opendaylight.controller.cluster.raft.RaftActor.handleCommand(RaftActor.java:270) [230:org.opendaylight.controller.sal-akka-raft:1.8.0.SNAPSHOT]
at org.opendaylight.controller.cluster.common.actor.AbstractUntypedPersistentActor.onReceiveCommand(AbstractUntypedPersistentActor.java:44) [238:org.opendaylight.controller.sal-clustering-commons:1.8.0.SNAPSHOT]
at akka.persistence.UntypedPersistentActor.onReceive(PersistentActor.scala:275) [43:com.typesafe.akka.persistence:2.5.11]
at org.opendaylight.controller.cluster.common.actor.MeteringBehavior.apply(MeteringBehavior.java:104) [238:org.opendaylight.controller.sal-clustering-commons:1.8.0.SNAPSHOT]
at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:608) [40:com.typesafe.akka.actor:2.5.11]
at akka.actor.Actor.aroundReceive(Actor.scala:517) [40:com.typesafe.akka.actor:2.5.11]
at akka.actor.Actor.aroundReceive$(Actor.scala:515) [40:com.typesafe.akka.actor:2.5.11]
at akka.persistence.UntypedPersistentActor.akka$persistence$Eventsourced$$super$aroundReceive(PersistentActor.scala:273) [43:com.typesafe.akka.persistence:2.5.11]
at akka.persistence.Eventsourced$$anon$1.stateReceive(Eventsourced.scala:691) [43:com.typesafe.akka.persistence:2.5.11]
at akka.persistence.Eventsourced.aroundReceive(Eventsourced.scala:192) [43:com.typesafe.akka.persistence:2.5.11]
at akka.persistence.Eventsourced.aroundReceive$(Eventsourced.scala:191) [43:com.typesafe.akka.persistence:2.5.11]
at akka.persistence.UntypedPersistentActor.aroundReceive(PersistentActor.scala:273) [43:com.typesafe.akka.persistence:2.5.11]
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:590) [40:com.typesafe.akka.actor:2.5.11]
at akka.actor.ActorCell.invoke(ActorCell.scala:559) [40:com.typesafe.akka.actor:2.5.11]
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257) [40:com.typesafe.akka.actor:2.5.11]
at akka.dispatch.Mailbox.run(Mailbox.scala:224) [40:com.typesafe.akka.actor:2.5.11]
at akka.dispatch.Mailbox.exec(Mailbox.scala:234) [40:com.typesafe.akka.actor:2.5.11]
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [40:com.typesafe.akka.actor:2.5.11]
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [40:com.typesafe.akka.actor:2.5.11]
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [40:com.typesafe.akka.actor:2.5.11]
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [40:com.typesafe.akka.actor:2.5.11]
Caused by: org.opendaylight.yangtools.yang.data.api.schema.tree.ConflictingModificationAppliedException: Node children was modified by other transaction
at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkConflicting(SchemaAwareApplyOperation.java:81) ~[446:org.opendaylight.yangtools.yang-data-impl:2.0.3]
at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkNotConflicting(SchemaAwareApplyOperation.java:114) ~[446:org.opendaylight.yangtools.yang-data-impl:2.0.3]
at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkWriteApplicable(SchemaAwareApplyOperation.java:178) ~[446:org.opendaylight.yangtools.yang-data-impl:2.0.3]
at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkApplicable(SchemaAwareApplyOperation.java:135) ~[446:org.opendaylight.yangtools.yang-data-impl:2.0.3]
at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkChildPreconditions(AbstractNodeContainerModificationStrategy.java:307) ~[446:org.opendaylight.yangtools.yang-data-impl:2.0.3]
at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkMergeApplicable(AbstractNodeContainerModificationStrategy.java:315) ~[446:org.opendaylight.yangtools.yang-data-impl:2.0.3]
at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkApplicable(SchemaAwareApplyOperation.java:138) ~[446:org.opendaylight.yangtools.yang-data-impl:2.0.3]
at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkChildPreconditions(AbstractNodeContainerModificationStrategy.java:307) ~[446:org.opendaylight.yangtools.yang-data-impl:2.0.3]
at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkTouchApplicable(AbstractNodeContainerModificationStrategy.java:290) ~[446:org.opendaylight.yangtools.yang-data-impl:2.0.3]
at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkApplicable(SchemaAwareApplyOperation.java:132) ~[446:org.opendaylight.yangtools.yang-data-impl:2.0.3]
at org.opendaylight.yangtools.yang.data.impl.schema.tree.StructuralContainerModificationStrategy.checkApplicable(StructuralContainerModificationStrategy.java:101) ~[446:org.opendaylight.yangtools.yang-data-impl:2.0.3]
at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkChildPreconditions(AbstractNodeContainerModificationStrategy.java:307) ~[446:org.opendaylight.yangtools.yang-data-impl:2.0.3]
at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkTouchApplicable(AbstractNodeContainerModificationStrategy.java:290) ~[446:org.opendaylight.yangtools.yang-data-impl:2.0.3]
at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkApplicable(SchemaAwareApplyOperation.java:132) ~[446:org.opendaylight.yangtools.yang-data-impl:2.0.3]
at org.opendaylight.yangtools.yang.data.impl.schema.tree.RootModificationApplyOperation.checkApplicable(RootModificationApplyOperation.java:72) ~[446:org.opendaylight.yangtools.yang-data-impl:2.0.3]
at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractDataTreeTip.validate(AbstractDataTreeTip.java:35) ~[446:org.opendaylight.yangtools.yang-data-impl:2.0.3]
at org.opendaylight.controller.cluster.datastore.ShardDataTree.lambda$processNextPendingTransaction$0(ShardDataTree.java:732) ~[246:org.opendaylight.controller.sal-distributed-datastore:1.8.0.SNAPSHOT]
... 30 more
|
| Comments |
| Comment by Faseela K [ 17/May/18 ] |
|
One thing I always wanted to ask is, when this OptimistickLockFailedException comes, it would be good if we can atleast get the path at which the exception has come. It is difficult to track from logs otherwise to understand which datastore write exactly resulted in this error. We are just assuming things from the immediate log before the exception, which may not be always true.
vorburger : Any help can be done in this area? |
| Comment by Faseela K [ 17/May/18 ] |
|
Assuming that the conflicting modification log immediately above this is the reason for this exception : 2018-05-11T17:06:10,067 | WARN | opendaylight-cluster-data-shard-dispatcher-32 | ShardDataTree | 246 - org.opendaylight.controller.sal-distributed-datastore - 1.8.0.SNAPSHOT | member-1-shard-default-config: Store Tx member-1-datastore-config-fe-0-txn-8784-0: Conflicting modification for path /(urn:ietf:params:xml:ns:yang:ietf-interfaces?revision=2014-05-08)interfaces/interface/interface[\{(urn:ietf:params:xml:ns:yang:ietf-interfaces?revision=2014-05-08)name=ed2e5068-31bd-4bcd-9479-0860a9eba76e}]. do we have any recent change in NeutronPortChangeListener in neutronvpn that has recently landed? |
| Comment by Michael Vorburger [ 22/May/18 ] |
|
> Any help can be done in this area?
|
| Comment by Sam Hague [ 24/May/18 ] |
| Comment by Faseela K [ 29/May/18 ] |
|
I had a look at the neutronvpn code. The reason why it throws exception even for single node is that DJC in neutron vpn uses a different key than the DJC in interface-manager, and hence the error is thrown. Now, even if we fix the DJC key, the solution will not work for cluster. I can think of two solutions :
|
| Comment by Shashidhar R [ 30/May/18 ] |
|
We analyzed logs and below are our observations from ACL point of view (1) From the logs OptimisticLockFailedException is observed for ietf-interfaces path as shown below (with latest logs attached): org.opendaylight.mdsal.common.api.OptimisticLockFailedException: Optimistic lock failed for path /(urn:ietf:params:xml:ns:yang:ietf-interfaces?revision=2014-05-08)interfaces/interface/interface[\{(urn:ietf:params:xml:ns:yang:ietf-interfaces?revision=2014-05-08)name=ecb03cb6-8a45-4b07-b760-3066f6017a5d}] (2) ACL only reads ietf-interfaces for doing validations; but does not modify interfaces. As it doesn’t modify interfaces, the OptimisticLockFailedException observed above is not from ACL module.
I also discussed about this with Abhinav from neutrovpn point of view. Below are the details:
Modification of SGs (like addition or deletion of rules): Handled independent of ietf-interfaces both in Neutronvpn and ACL modules. Assigning bug to Faseela for analyzing further from interface manger point of view. |
| Comment by Abhinav Gupta [ 30/May/18 ] |
|
Hi Faseela, Regarding the two possible solutions you’ve suggested:
So currently, NeutronVPN: Creates/deletes the interface, updates (adds) InterfaceACL augmentation for security updates Inf-Mgr : Updates the parentRefs
Since inf-mgr has the entire data to create inf upon receiving exernal-id info from OVSDB, can we create the inf the first time then itself, as well as handle deletes based on the same? ACL aug updates today are doing a read on the entire inf object and then doing a PUT (looks like a leftover PUT from when the entire object was fetched for updating parent-refs). This can be changed to MERGE as well as we can write directly on the augmentation path identifier.
This would align with the single writer principle. What do you think? |
| Comment by Faseela K [ 30/May/18 ] |
|
Interface-manager will have trouble in understanding the interface-type, trunk-port or sub-port as I don’t see that information coming from southbound. If this can be supplied by nova while creating the VM, it can be done. Could you please confirm? Else the solution you have proposed is not feasible. So please pick one solution from the 2 I have given Thanks, Faseela |
| Comment by Faseela K [ 30/May/18 ] |
|
Let us decide on what will be the preferred approach from the 2 solutions specified. Assigning the bug to Abhinav to decide on the solution. Based on whether the change is needed in neutron-vpn or interface-manager, we can change the assignee later on. |
| Comment by Faseela K [ 30/May/18 ] |
|
shague : One quick question, is this yet another exception coming in the log, or are there functional impacts. When I take a look at the code, we are using DJC retries, and if two threads are doing transaction.merge(), for the initial time we might get the exception, but subsequent retries would have gone through. Is that the case? If so, do we just want to enhance DJC not print the whole exception stack trace for initial trials, and only do it if all retries fail? vorburger : Does this seem valid? |
| Comment by Michael Vorburger [ 31/May/18 ] |
|
k.faseela I am pretty sure that I have already done exactly what you propose quite a while ago, check out:
If you have reason to believe that this is not working as we would like it to, can you open a new Jira in INFRAUTILS with details of a case where you think it is printing the whole exception stack trace for initial trials? I don't see how. |
| Comment by Abhinav Gupta [ 31/May/18 ] |
|
Looked into Karaf logs from build 510 to 539 for a few builds randomly. This exception doesn't show up. |
| Comment by Abhinav Gupta [ 31/May/18 ] |
|
k.faseela, the exception here, as per build 509 occurs only once throughout the log where LocalThreePhaseCommitCohort prints an ERROR message.
|
| Comment by Abhinav Gupta [ 14/Jun/18 ] |
|
Hi shague, as this exception was last seen in build 509, do we need to keep tracking this one?
|