[NETVIRT-1324] OptimisticLockFailedException.../flow/flow...id=L3.53688268439021.20.0L3 Created: 19/Jun/18  Updated: 17/Sep/18  Resolved: 17/Sep/18

Status: Resolved
Project: netvirt
Component/s: None
Affects Version/s: None
Fix Version/s: Fluorine, Neon

Type: Bug Priority: Lowest
Reporter: Sam Hague Assignee: Michael Vorburger
Resolution: Done Votes: 0
Labels: csit:3node
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates
relates to CONTROLLER-1855 Controller should not error log any O... Resolved
relates to NETVIRT-1015 NatEvpnUtil: getExtNwProvTypeFromRout... Resolved
Epic Link: Clustering Stability

 Description   
org.opendaylight.mdsal.common.api.OptimisticLockFailedException: Optimistic lock failed for path /(urn:opendaylight:inventory?revision=2013-08-19)nodes/node/node[{(urn:opendaylight:inventory?revision=2013-08-19)id=openflow:53688268439021}]/AugmentationIdentifier{childNames=[(urn:opendaylight:flow:inventory?revision=2013-08-19)description, (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-actions, (urn:opendaylight:flow:inventory?revision=2013-08-19)hardware, (urn:opendaylight:flow:inventory?revision=2013-08-19)switch-features, (urn:opendaylight:flow:inventory?revision=2013-08-19)stale-meter, (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-instructions, (urn:opendaylight:flow:inventory?revision=2013-08-19)meter, (urn:opendaylight:flow:inventory?revision=2013-08-19)serial-number, (urn:opendaylight:flow:inventory?revision=2013-08-19)stale-group, (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-match-types, (urn:opendaylight:flow:inventory?revision=2013-08-19)port-number, (urn:opendaylight:flow:inventory?revision=2013-08-19)table, (urn:opendaylight:flow:inventory?revision=2013-08-19)group, (urn:opendaylight:flow:inventory?revision=2013-08-19)manufacturer, (urn:opendaylight:flow:inventory?revision=2013-08-19)table-features, (urn:opendaylight:flow:inventory?revision=2013-08-19)software, (urn:opendaylight:flow:inventory?revision=2013-08-19)ip-address]}/(urn:opendaylight:flow:inventory?revision=2013-08-19)table/table[{(urn:opendaylight:flow:inventory?revision=2013-08-19)id=20}]/flow/flow[{(urn:opendaylight:flow:inventory?revision=2013-08-19)id=L3.53688268439021.20.0L3.}]



 Comments   
Comment by Jamo Luhrsen [ 28/Jun/18 ]

shague is this just an exception from CSIT that we turned in to a jira? No real context to go on
right? Will someone be able to address it with just the exception line?

Comment by Michael Vorburger [ 28/Jun/18 ]

shague I could probably help with this this one (by adding retry), but the log above is too minimalistic to be actionable - we would need a full stack trace, to see where in the code that OptimisticLockFailedException comes from. If that's not available, I propose that you close this issue, and a new one be opened with details, when available.

Comment by Sam Hague [ 28/Jun/18 ]

I normally add the job and the karaf logs that this came from. I will try to dig it up.

Comment by Sam Hague [ 28/Jun/18 ]

vorburger adding a retry - does that mean an exception would not come out until the last retry failed? Similar to other exceptions you fixed a while back with the retry framework?

Other question: that is the right behavior, right? OLFE is not a problem in itself since you could have multiple writers, so you retry and hope it passes. Or is this a design issue that should be fixed not to have multiple writers to the same tree data?

Comment by Michael Vorburger [ 02/Jul/18 ]

> does that mean an exception would not come out until the last retry failed? 

yup; that's what the https://github.com/opendaylight/genius/blob/master/mdsalutil/mdsalutil-api/src/main/java/org/opendaylight/genius/infra/RetryingManagedNewTransactionRunner.java does (see https://github.com/opendaylight/genius/blob/master/mdsalutil/mdsalutil-api/src/main/java/org/opendaylight/genius/infra/RetryingManagedNewTransactionRunnerImpl.java for details)

>  that is the right behavior, right? OLFE is not a problem in itself since you could have multiple writers, so you retry and hope it passes.

It will fix the problem, is very easy to do (you should try doing it yourself!), and is what we have done elsewhere.

> Or is this a design issue that should be fixed not to have multiple writers to the same tree data?

You can also fix this issue that way, but that is a much bigger design change (if it's even possible).

Comment by Jamo Luhrsen [ 14/Aug/18 ]

this is still happening, but it's possible that it's not affecting functionality. Without more context, it will be hard for someone like vorburger to look at it.

shague can you dig up a karaf.log that has this?

Comment by Sam Hague [ 21/Aug/18 ]

still happening, will add further triage: https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-3node-0cmb-1ctl-2cmp-openstack-queens-upstream-stateful-oxygen/31/odl_1/odl1_err_warn_exception.log.gz

2018-08-13T05:08:13,967 | WARN  | opendaylight-cluster-data-akka.actor.default-dispatcher-5 | LocalThreePhaseCommitCohort      | 235 - org.opendaylight.controller.sal-distributed-datastore - 1.7.4 | Failed to prepare transaction member-1-datastore-config-fe-0-txn-112-0 on backend
org.opendaylight.controller.md.sal.common.api.data.OptimisticLockFailedException: Optimistic lock failed for path /(urn:opendaylight:inventory?revision=2013-08-19)nodes/node/node[{(urn:opendaylight:inventory?revision=2013-08-19)id=openflow:119324593392072}]/AugmentationIdentifier{childNames=[(urn:opendaylight:flow:inventory?revision=2013-08-19)description, (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-actions, (urn:opendaylight:flow:inventory?revision=2013-08-19)hardware, (urn:opendaylight:flow:inventory?revision=2013-08-19)switch-features, (urn:opendaylight:flow:inventory?revision=2013-08-19)stale-meter, (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-instructions, (urn:opendaylight:flow:inventory?revision=2013-08-19)meter, (urn:opendaylight:flow:inventory?revision=2013-08-19)serial-number, (urn:opendaylight:flow:inventory?revision=2013-08-19)stale-group, (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-match-types, (urn:opendaylight:flow:inventory?revision=2013-08-19)port-number, (urn:opendaylight:flow:inventory?revision=2013-08-19)table, (urn:opendaylight:flow:inventory?revision=2013-08-19)group, (urn:opendaylight:flow:inventory?revision=2013-08-19)manufacturer, (urn:opendaylight:flow:inventory?revision=2013-08-19)table-features, (urn:opendaylight:flow:inventory?revision=2013-08-19)software, (urn:opendaylight:flow:inventory?revision=2013-08-19)ip-address]}/(urn:opendaylight:flow:inventory?revision=2013-08-19)table/table[{(urn:opendaylight:flow:inventory?revision=2013-08-19)id=20}]/flow/flow[{(urn:opendaylight:flow:inventory?revision=2013-08-19)id=L3.119324593392072.20.0L3.}]
Caused by: org.opendaylight.yangtools.yang.data.api.schema.tree.ConflictingModificationAppliedException: Node was created by other transaction.
2018-08-13T05:08:32,925 | WARN  | pool-89-thread-1 | OutboundQueueEntry               | 389 - org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl - 0.6.4 | Can't commit a completed message.
2018-08-13T05:08:32,925 | WARN  | pool-89-thread-1 | HashedWheelTimer                 | 62 - io.netty.common - 4.1.22.Final | An exception was thrown by TimerTask.
java.lang.NullPointerException: null
2018-08-13T05:09:00,106 | WARN  | epollEventLoopGroup-9-2 | HandshakeManagerImpl             | 380 - org.opendaylight.openflowplugin.impl - 0.6.4 | issuing disconnect during handshake [/10.30.170.201:56960]
Comment by Michael Vorburger [ 28/Aug/18 ]

shague et al, so as discussed during the Kernel Projects call today:

In CONTROLLER-1855, it was decided that WARN from LocalThreePhaseCommitCohort was "benign" and could be ignored - if applications handled it; in this case, whatever writes to opendaylight:flow:inventory needs to retry the failed future; see below. (I had argued in the discussion of that bug that it should therefore just be an INFO or even DEBUG intead of a WARN; but it was decided that it's safer to keep it as a WARN, but not ERROR.)

So the (first; others are totally unrelated to this issue!) WARN shown above should be ignored. What still does need to be looked into in this issue is these two ERROR logs; this first one is directly related to that WARN on opendaylight:flow:inventory:

2018-08-13T05:42:03,918 | ERROR | CommitFutures-30 | JobCoordinatorImpl               | 281 - org.opendaylight.infrautils.jobcoordinator-impl - 1.3.4 | Job still failed on final retry: JobEntry{key='MAC-FA:16:3E:D2:50:A7 ELAN_TAG-5006DPN_ID-119324593392072', mainWorker=org.opendaylight.netvirt.elan.internal.ElanPacketInHandler$$Lambda$1757/325670119@164ed935, rollbackWorker=null, retryCount=3/3, futures=[org.opendaylight.controller.cluster.databroker.ConcurrentDOMDataBroker$AsyncNotifyingSettableFuture@534c5598[status=FAILURE, cause=[OptimisticLockFailedException{message=Optimistic lock failed for path /(urn:opendaylight:inventory?revision=2013-08-19)nodes/node/node[{(urn:opendaylight:inventory?revision=2013-08-19)id=openflow:119324593392072}]/AugmentationIdentifier{childNames=[(urn:opendaylight:flow:inventory?revision=2013-08-19)description, (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-actions, (urn:opendaylight:flow:inventory?revision=2013-08-19)hardware, (urn:opendaylight:flow:inventory?revision=2013-08-19)switch-features, (urn:opendaylight:flow:inventory?revision=2013-08-19)stale-meter, (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-instructions, (urn:opendaylight:flow:inventory?revision=2013-08-19)meter, (urn:opendaylight:flow:inventory?revision=2013-08-19)serial-number, (urn:opendaylight:flow:inventory?revision=2013-08-19)stale-group, (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-match-types, (urn:opendaylight:flow:inventory?revision=2013-08-19)port-number, (urn:opendaylight:flow:inventory?revision=2013-08-19)table, (urn:opendaylight:flow:inventory?revision=2013-08-19)group, (urn:opendaylight:flow:inventory?revision=2013-08-19)manufacturer, (urn:opendaylight:flow:inventory?revision=2013-08-19)table-features, (urn:opendaylight:flow:inventory?revision=2013-08-19)software, (urn:opendaylight:flow:inventory?revision=2013-08-19)ip-address]}/(urn:opendaylight:flow:inventory?revision=2013-08-19)table/table[{(urn:opendaylight:flow:inventory?revision=2013-08-19)id=50}]/flow/flow[{(urn:opendaylight:flow:inventory?revision=2013-08-19)id=50500611932459339207227FA:16:3E:D2:50:A7}], errorList=[RpcError [message=Optimistic lock failed for path /(urn:opendaylight:inventory?revision=2013-08-19)nodes/node/node[{(urn:opendaylight:inventory?revision=2013-08-19)id=openflow:119324593392072}]/AugmentationIdentifier{childNames=[(urn:opendaylight:flow:inventory?revision=2013-08-19)description, (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-actions, (urn:opendaylight:flow:inventory?revision=2013-08-19)hardware, (urn:opendaylight:flow:inventory?revision=2013-08-19)switch-features, (urn:opendaylight:flow:inventory?revision=2013-08-19)stale-meter, (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-instructions, (urn:opendaylight:flow:inventory?revision=2013-08-19)meter, (urn:opendaylight:flow:inventory?revision=2013-08-19)serial-number, (urn:opendaylight:flow:inventory?revision=2013-08-19)stale-group, (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-match-types, (urn:opendaylight:flow:inventory?revision=2013-08-19)port-number, (urn:opendaylight:flow:inventory?revision=2013-08-19)table, (urn:opendaylight:flow:inventory?revision=2013-08-19)group, (urn:opendaylight:flow:inventory?revision=2013-08-19)manufacturer, (urn:opendaylight:flow:inventory?revision=2013-08-19)table-features, (urn:opendaylight:flow:inventory?revision=2013-08-19)software, (urn:opendaylight:flow:inventory?revision=2013-08-19)ip-address]}/(urn:opendaylight:flow:inventory?revision=2013-08-19)table/table[{(urn:opendaylight:flow:inventory?revision=2013-08-19)id=50}]/flow/flow[{(urn:opendaylight:flow:inventory?revision=2013-08-19)id=50500611932459339207227FA:16:3E:D2:50:A7}], severity=ERROR, errorType=APPLICATION, tag=resource-denied, applicationTag=null, info=null, cause=org.opendaylight.yangtools.yang.data.api.schema.tree.ConflictingModificationAppliedException: Node was replaced by other transaction.]]}]]]}

whereas this other one is separate:

2018-08-13T07:29:52,155 | ERROR | opendaylight-cluster-data-notification-dispatcher-76 | UpgradeUtils                     | 266 - org.opendaylight.genius.mdsalutil-impl - 0.4.4 | Unable to update UpgradeState
java.util.concurrent.ExecutionException: OptimisticLockFailedException{message=Optimistic lock failed for path /(urn:opendaylight:serviceutils:upgrade?revision=2018-07-02)upgrade-config, errorList=[RpcError [message=Optimistic lock failed for path /(urn:opendaylight:serviceutils:upgrade?revision=2018-07-02)upgrade-config, severity=ERROR, errorType=APPLICATION, tag=resource-denied, applicationTag=null, info=null, cause=org.opendaylight.yangtools.yang.data.api.schema.tree.ConflictingModificationAppliedException: Node was replaced by other transaction.]]}
Caused by: org.opendaylight.controller.md.sal.common.api.data.OptimisticLockFailedException: Optimistic lock failed for path /(urn:opendaylight:serviceutils:upgrade?revision=2018-07-02)upgrade-config
Caused by: org.opendaylight.yangtools.yang.data.api.schema.tree.ConflictingModificationAppliedException: Node was replaced by other transaction.
Comment by Michael Vorburger [ 30/Aug/18 ]

GENIUS-207 will track the solution to the OLFE in genius UpgradeState.

This bug will track netvirt.elan.internal.ElanPacketInHandler flow inventory job OLFE.

Generated at Wed Feb 07 20:23:47 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.