[NETVIRT-1260] OLFE: Conflicting modification for path... interfaces/interface/interface... name=ed2e5068-31bd-4bcd-9479-0860a9eba76e Created: 15/May/18  Updated: 19/Nov/19  Resolved: 03/Oct/18

Status: Resolved
Project: netvirt
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Medium
Reporter: Sam Hague Assignee: Abhinav Gupta
Resolution: Cannot Reproduce Votes: 0
Labels: csit:exception
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File odl1_karaf.421.log.tar.xz    
Issue Links:
Relates
relates to NETVIRT-1259 OLFE: Conflicting modification for pa... Resolved

 Description   

https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-1node-openstack-queens-upstream-stateful-fluorine/421/odl_1/odl1_karaf.log.gz

2018-05-11T17:06:10,067 | WARN  | opendaylight-cluster-data-shard-dispatcher-32 | ShardDataTree                    | 246 - org.opendaylight.controller.sal-distributed-datastore - 1.8.0.SNAPSHOT | member-1-shard-default-config: Store Tx member-1-datastore-config-fe-0-txn-8784-0: Conflicting modification for path /(urn:ietf:params:xml:ns:yang:ietf-interfaces?revision=2014-05-08)interfaces/interface/interface[{(urn:ietf:params:xml:ns:yang:ietf-interfaces?revision=2014-05-08)name=ed2e5068-31bd-4bcd-9479-0860a9eba76e}].
2018-05-11T17:06:10,083 | INFO  | ForkJoinPool-1-worker-2 | InterfaceAddWorkerOnElanInterface | 367 - org.opendaylight.netvirt.elanmanager-impl - 0.7.0.SNAPSHOT | Handling elan interface bc5e0034-f81a-47ff-904f-49376d4312e7 add for elan 703b8c5f-3d4e-4ed2-b628-8860ba3de59e 
2018-05-11T17:06:10,070 | ERROR | opendaylight-cluster-data-akka.actor.default-dispatcher-3 | LocalThreePhaseCommitCohort      | 246 - org.opendaylight.controller.sal-distributed-datastore - 1.8.0.SNAPSHOT | Failed to prepare transaction member-1-datastore-config-fe-0-txn-8784-0 on backend
org.opendaylight.mdsal.common.api.OptimisticLockFailedException: Optimistic lock failed.
	at org.opendaylight.controller.cluster.datastore.ShardDataTree.lambda$processNextPendingTransaction$0(ShardDataTree.java:740) ~[246:org.opendaylight.controller.sal-distributed-datastore:1.8.0.SNAPSHOT]
	at org.opendaylight.controller.cluster.datastore.ShardDataTree.processNextPending(ShardDataTree.java:778) [246:org.opendaylight.controller.sal-distributed-datastore:1.8.0.SNAPSHOT]
	at org.opendaylight.controller.cluster.datastore.ShardDataTree.processNextPendingTransaction(ShardDataTree.java:725) [246:org.opendaylight.controller.sal-distributed-datastore:1.8.0.SNAPSHOT]
	at org.opendaylight.controller.cluster.datastore.ShardDataTree.startCanCommit(ShardDataTree.java:808) [246:org.opendaylight.controller.sal-distributed-datastore:1.8.0.SNAPSHOT]
	at org.opendaylight.controller.cluster.datastore.SimpleShardDataTreeCohort.canCommit(SimpleShardDataTreeCohort.java:84) [246:org.opendaylight.controller.sal-distributed-datastore:1.8.0.SNAPSHOT]
	at org.opendaylight.controller.cluster.datastore.CohortEntry.canCommit(CohortEntry.java:97) [246:org.opendaylight.controller.sal-distributed-datastore:1.8.0.SNAPSHOT]
	at org.opendaylight.controller.cluster.datastore.ShardCommitCoordinator.handleCanCommit(ShardCommitCoordinator.java:236) [246:org.opendaylight.controller.sal-distributed-datastore:1.8.0.SNAPSHOT]
	at org.opendaylight.controller.cluster.datastore.ShardCommitCoordinator.handleReadyLocalTransaction(ShardCommitCoordinator.java:200) [246:org.opendaylight.controller.sal-distributed-datastore:1.8.0.SNAPSHOT]
	at org.opendaylight.controller.cluster.datastore.Shard.handleReadyLocalTransaction(Shard.java:731) [246:org.opendaylight.controller.sal-distributed-datastore:1.8.0.SNAPSHOT]
	at org.opendaylight.controller.cluster.datastore.Shard.handleNonRaftCommand(Shard.java:333) [246:org.opendaylight.controller.sal-distributed-datastore:1.8.0.SNAPSHOT]
	at org.opendaylight.controller.cluster.raft.RaftActor.handleCommand(RaftActor.java:270) [230:org.opendaylight.controller.sal-akka-raft:1.8.0.SNAPSHOT]
	at org.opendaylight.controller.cluster.common.actor.AbstractUntypedPersistentActor.onReceiveCommand(AbstractUntypedPersistentActor.java:44) [238:org.opendaylight.controller.sal-clustering-commons:1.8.0.SNAPSHOT]
	at akka.persistence.UntypedPersistentActor.onReceive(PersistentActor.scala:275) [43:com.typesafe.akka.persistence:2.5.11]
	at org.opendaylight.controller.cluster.common.actor.MeteringBehavior.apply(MeteringBehavior.java:104) [238:org.opendaylight.controller.sal-clustering-commons:1.8.0.SNAPSHOT]
	at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:608) [40:com.typesafe.akka.actor:2.5.11]
	at akka.actor.Actor.aroundReceive(Actor.scala:517) [40:com.typesafe.akka.actor:2.5.11]
	at akka.actor.Actor.aroundReceive$(Actor.scala:515) [40:com.typesafe.akka.actor:2.5.11]
	at akka.persistence.UntypedPersistentActor.akka$persistence$Eventsourced$$super$aroundReceive(PersistentActor.scala:273) [43:com.typesafe.akka.persistence:2.5.11]
	at akka.persistence.Eventsourced$$anon$1.stateReceive(Eventsourced.scala:691) [43:com.typesafe.akka.persistence:2.5.11]
	at akka.persistence.Eventsourced.aroundReceive(Eventsourced.scala:192) [43:com.typesafe.akka.persistence:2.5.11]
	at akka.persistence.Eventsourced.aroundReceive$(Eventsourced.scala:191) [43:com.typesafe.akka.persistence:2.5.11]
	at akka.persistence.UntypedPersistentActor.aroundReceive(PersistentActor.scala:273) [43:com.typesafe.akka.persistence:2.5.11]
	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:590) [40:com.typesafe.akka.actor:2.5.11]
	at akka.actor.ActorCell.invoke(ActorCell.scala:559) [40:com.typesafe.akka.actor:2.5.11]
	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257) [40:com.typesafe.akka.actor:2.5.11]
	at akka.dispatch.Mailbox.run(Mailbox.scala:224) [40:com.typesafe.akka.actor:2.5.11]
	at akka.dispatch.Mailbox.exec(Mailbox.scala:234) [40:com.typesafe.akka.actor:2.5.11]
	at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [40:com.typesafe.akka.actor:2.5.11]
	at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [40:com.typesafe.akka.actor:2.5.11]
	at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [40:com.typesafe.akka.actor:2.5.11]
	at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [40:com.typesafe.akka.actor:2.5.11]
Caused by: org.opendaylight.yangtools.yang.data.api.schema.tree.ConflictingModificationAppliedException: Node children was modified by other transaction
	at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkConflicting(SchemaAwareApplyOperation.java:81) ~[446:org.opendaylight.yangtools.yang-data-impl:2.0.3]
	at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkNotConflicting(SchemaAwareApplyOperation.java:114) ~[446:org.opendaylight.yangtools.yang-data-impl:2.0.3]
	at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkWriteApplicable(SchemaAwareApplyOperation.java:178) ~[446:org.opendaylight.yangtools.yang-data-impl:2.0.3]
	at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkApplicable(SchemaAwareApplyOperation.java:135) ~[446:org.opendaylight.yangtools.yang-data-impl:2.0.3]
	at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkChildPreconditions(AbstractNodeContainerModificationStrategy.java:307) ~[446:org.opendaylight.yangtools.yang-data-impl:2.0.3]
	at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkMergeApplicable(AbstractNodeContainerModificationStrategy.java:315) ~[446:org.opendaylight.yangtools.yang-data-impl:2.0.3]
	at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkApplicable(SchemaAwareApplyOperation.java:138) ~[446:org.opendaylight.yangtools.yang-data-impl:2.0.3]
	at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkChildPreconditions(AbstractNodeContainerModificationStrategy.java:307) ~[446:org.opendaylight.yangtools.yang-data-impl:2.0.3]
	at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkTouchApplicable(AbstractNodeContainerModificationStrategy.java:290) ~[446:org.opendaylight.yangtools.yang-data-impl:2.0.3]
	at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkApplicable(SchemaAwareApplyOperation.java:132) ~[446:org.opendaylight.yangtools.yang-data-impl:2.0.3]
	at org.opendaylight.yangtools.yang.data.impl.schema.tree.StructuralContainerModificationStrategy.checkApplicable(StructuralContainerModificationStrategy.java:101) ~[446:org.opendaylight.yangtools.yang-data-impl:2.0.3]
	at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkChildPreconditions(AbstractNodeContainerModificationStrategy.java:307) ~[446:org.opendaylight.yangtools.yang-data-impl:2.0.3]
	at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkTouchApplicable(AbstractNodeContainerModificationStrategy.java:290) ~[446:org.opendaylight.yangtools.yang-data-impl:2.0.3]
	at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkApplicable(SchemaAwareApplyOperation.java:132) ~[446:org.opendaylight.yangtools.yang-data-impl:2.0.3]
	at org.opendaylight.yangtools.yang.data.impl.schema.tree.RootModificationApplyOperation.checkApplicable(RootModificationApplyOperation.java:72) ~[446:org.opendaylight.yangtools.yang-data-impl:2.0.3]
	at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractDataTreeTip.validate(AbstractDataTreeTip.java:35) ~[446:org.opendaylight.yangtools.yang-data-impl:2.0.3]
	at org.opendaylight.controller.cluster.datastore.ShardDataTree.lambda$processNextPendingTransaction$0(ShardDataTree.java:732) ~[246:org.opendaylight.controller.sal-distributed-datastore:1.8.0.SNAPSHOT]
	... 30 more


 Comments   
Comment by Faseela K [ 17/May/18 ]

One thing I always wanted to ask is, when this OptimistickLockFailedException comes, it would be good if we can atleast get the path at which the exception has come. It is difficult to track from logs otherwise to understand which datastore write exactly resulted in this error.

We are just assuming things from the immediate log before the exception, which may not be always true.

 

vorburger : Any help can be done in this area?

Comment by Faseela K [ 17/May/18 ]

Assuming that the conflicting modification log immediately above this is the reason for this exception :

2018-05-11T17:06:10,067 | WARN | opendaylight-cluster-data-shard-dispatcher-32 | ShardDataTree | 246 - org.opendaylight.controller.sal-distributed-datastore - 1.8.0.SNAPSHOT | member-1-shard-default-config: Store Tx member-1-datastore-config-fe-0-txn-8784-0: Conflicting modification for path /(urn:ietf:params:xml:ns:yang:ietf-interfaces?revision=2014-05-08)interfaces/interface/interface[\{(urn:ietf:params:xml:ns:yang:ietf-interfaces?revision=2014-05-08)name=ed2e5068-31bd-4bcd-9479-0860a9eba76e}].

do we have any recent change in NeutronPortChangeListener in neutronvpn that has recently landed?

Comment by Michael Vorburger [ 22/May/18 ]

> Any help can be done in this area?

CONTROLLER-1830 is taking care of this, from what I understood.

Comment by Sam Hague [ 24/May/18 ]

still seen: https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-1node-openstack-queens-upstream-stateful-fluorine/509/odl_1/odl1_karaf.log.gz

Comment by Faseela K [ 29/May/18 ]

I had a look at the neutronvpn code.

The reason why it  throws exception even for single node is that DJC in neutron vpn uses a different key than the DJC in interface-manager, and hence the error is thrown.

Now, even if we fix the DJC key, the solution will not work for cluster.

I can think of two solutions :

 

  • Use DataTreeEventCallbackRegistrar to make one of the two guys wait till the other guy pushes all information.(This utility currently does not work in Cluster, see : GENIUS-141, which I am trying to fix first. Also not quite sure whether this will bring in any performance bottlenecks).
  • Move updating parent-ref logic back to neutron-vpn. Neutron-vpn can listen for ovsdb event to get the external-id information and do update parent-interface, which will easily align with single writer principle.
Comment by Shashidhar R [ 30/May/18 ]

We analyzed logs and below are our observations from ACL point of view

(1) From the logs OptimisticLockFailedException is observed for ietf-interfaces path as shown below (with latest logs attached):

 org.opendaylight.mdsal.common.api.OptimisticLockFailedException: Optimistic lock failed for path /(urn:ietf:params:xml:ns:yang:ietf-interfaces?revision=2014-05-08)interfaces/interface/interface[\{(urn:ietf:params:xml:ns:yang:ietf-interfaces?revision=2014-05-08)name=ecb03cb6-8a45-4b07-b760-3066f6017a5d}]

 (2) ACL only reads ietf-interfaces for doing validations; but does not modify interfaces. As it doesn’t modify interfaces, the  OptimisticLockFailedException observed above is not from ACL module.

 

I also discussed about this with Abhinav from neutrovpn point of view. Below are the details:

  1. In oxygen and fluorine, interface manager is updating the parent refs.
  2. As Neutronvpn creates/deletes the interface with jobCoordinator, ConflictingModificationAppliedException would not be in Neutronvpn.
  3. Also SG addition/deletion to a port is modification of one of the parameter of neutron port itself. As this is already handled with jobCoordinator, ConflictingModificationAppliedException would not be because of SG updates as well.

Modification of SGs (like addition or deletion of rules): Handled independent of ietf-interfaces both in Neutronvpn and ACL modules.

Assigning bug to Faseela for analyzing further from interface manger point of view.

Comment by Abhinav Gupta [ 30/May/18 ]

Hi Faseela,

Regarding the two possible solutions you’ve suggested:

  • Use DataTreeEventCallbackRegistrar to make one of the two guys wait till the other guy pushes all information.(This utility currently does not work in Cluster, see : GENIUS-141, which I am trying to fix first. Also not quite sure whether this will bring in any performance bottlenecks).
  • [Abhinav] Since this is at per port level, this could rightly bring in performance bottlenecks and we may want a better solution.

 

  • Move updating parent-ref logic back to neutron-vpn. Neutron-vpn can listen for ovsdb event to get the external-id information and do update parent-interface, which will easily align with single writer principle.
  • [Abhinav]We’ve always tried to keep nvpn decoupled from southbound logic. This could be a last resort solution if nothing else fits.

 

So currently,

NeutronVPN: Creates/deletes the interface, updates (adds) InterfaceACL augmentation for security updates

Inf-Mgr          : Updates the parentRefs

 

Since inf-mgr has the entire data to create inf upon receiving exernal-id info from OVSDB, can we create the inf the first time then itself, as well as handle deletes based on the same? ACL aug updates today are doing a read on the entire inf object and then doing a PUT (looks like a leftover PUT from when the entire object was fetched for updating parent-refs). This can be changed to MERGE as well as we can write directly on the augmentation path identifier.

 

This would align with the single writer principle. What do you think?

Comment by Faseela K [ 30/May/18 ]

AbhinavGupta,

  Interface-manager will have trouble in understanding the interface-type, trunk-port or sub-port as I don’t see that information coming from southbound. If this can be supplied by nova while creating the VM, it can be done. Could you please confirm?

Else the solution you have proposed is not feasible. So please pick one solution from the 2 I have given

Thanks,

Faseela

Comment by Faseela K [ 30/May/18 ]

Let us decide on what will be the preferred approach from the 2 solutions specified. Assigning the bug to Abhinav to decide on the solution. Based on whether the change is needed in neutron-vpn or interface-manager, we can change the assignee later on.

Comment by Faseela K [ 30/May/18 ]

shague : One quick question, is this yet another exception coming in the log, or are there functional impacts.

When I take a look at the code, we are using DJC retries, and if two threads are doing transaction.merge(), for the initial time we might get the exception, but subsequent retries would have gone through. Is that the case?

If so, do we just want to enhance DJC not print the whole exception stack trace for initial trials, and only do it if all retries fail?

vorburger : Does this seem valid?

Comment by Michael Vorburger [ 31/May/18 ]

k.faseela I am pretty sure that I have already done exactly what you propose quite a while ago, check out:

If you have reason to believe that this is not working as we would like it to, can you open a new Jira in INFRAUTILS with details of a case where you think it is printing the whole exception stack trace for initial trials? I don't see how.

Comment by Abhinav Gupta [ 31/May/18 ]

Looked into Karaf logs from build 510 to 539 for a few builds randomly. This exception doesn't show up.
509 was the last build where this exception was seen, and from the Karaf logs, looks like to be a conflict between SG update from NeutronVPN and parent ref update from inf-mgr.

Comment by Abhinav Gupta [ 31/May/18 ]

k.faseela, the exception here, as per build 509 occurs only once throughout the log where LocalThreePhaseCommitCohort prints an ERROR message.
Looks like the 2nd retry succeeded, and there's no functionality impact here.

 

Comment by Abhinav Gupta [ 14/Jun/18 ]

Hi shague, as this exception was last seen in build 509, do we need to keep tracking this one?

 

 

Generated at Wed Feb 07 20:23:37 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.