[YANGTOOLS-559] Delete transactions fail in MinMaxElementsValidation under load Created: 26/Nov/15 Updated: 10/Apr/22 Resolved: 09/Feb/16 |
|
| Status: | Resolved |
| Project: | yangtools |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Stephen Kitt | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| Issue Links: |
|
||||||||
| External issue ID: | 4684 | ||||||||
| Priority: | High | ||||||||
| Description |
|
In OVSDB we're seeing transaction failures from MinMaxElementsValidation under load; the log is (with the extra info from 30247 and 30249): 2015-11-26 15:40:12,794 | ERROR | lt-dispatcher-24 | LocalThreePhaseCommitCohort | 178 - org.opendaylight.controller.sal-distributed-datastore - 1.3.0.SNAPSHOT | Failed to prepare transaction member-1-chn-1-txn-38 on backend ]=NodeModification [identifier=(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)node[ {(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)node-id=ovsdb://127.0.0.1:6640/bridge/br-int}], modificationType=DELETE, childModification={}], (urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)node[ {(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)node-id=ovsdb://127.0.0.1:6640}]=NodeModification [identifier=(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)node[ {(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)node-id=ovsdb://127.0.0.1:6640}], modificationType=DELETE, childModification={}]}] and Optional.of(MaterializedContainerNode{subtreeVersion=org.opendaylight.yangtools.yang.data.api.schema.tree.spi.Version@7cb06828, children={(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)node[ {(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)node-id=ovsdb://127.0.0.1:6640/bridge/br-int}]=LazyContainerNode{subtreeVersion=org.opendaylight.yangtools.yang.data.api.schema.tree.spi.Version@398378c4, children={AugmentationIdentifier {childNames=[(urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)controller-entry, (urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)bridge-external-ids, (urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)bridge-other-configs, (urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)datapath-id, (urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)fail-mode, (urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)protocol-entry, (urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)flow-node, (urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)bridge-uuid, (urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)bridge-name, (urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)bridge-openflow-node-ref, (urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)managed-by, (urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)datapath-type]}=LazyContainerNode{subtreeVersion=org.opendaylight.yangtools.yang.data.api.schema.tree.spi.Version@398378c4, children={(urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)bridge-external-ids=MaterializedContainerNode{subtreeVersion=org.opendaylight.yangtools.yang.data.api.schema.tree.spi.Version@398378c4, children={(urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)bridge-external-ids[ {(urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)bridge-external-id-key=opendaylight-iid}]=MaterializedContainerNode{subtreeVersion=org.opendaylight.yangtools.yang.data.api.schema.tree.spi.Version@398378c4, children={(urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)bridge-external-id-value=org.opendaylight.yangtools.yang.data.api.schema.tree.spi.ValueNode@6e9ea50b, (urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)bridge-external-id-key=org.opendaylight.yangtools.yang.data.api.schema.tree.spi.ValueNode@74c4f53a}}}}, (urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)bridge-other-configs=MaterializedContainerNode{subtreeVersion=org.opendaylight.yangtools.yang.data.api.schema.tree.spi.Version@398378c4, children={(urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)bridge-other-configs[ {(urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)bridge-other-config-key=disable-in-band}]=MaterializedContainerNode{subtreeVersion=org.opendaylight.yangtools.yang.data.api.schema.tree.spi.Version@398378c4, children={(urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)bridge-other-config-value=org.opendaylight.yangtools.yang.data.api.schema.tree.spi.ValueNode@bca32a2, (urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)bridge-other-config-key=org.opendaylight.yangtools.yang.data.api.schema.tree.spi.ValueNode@29ad2e47}}}}, (urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)bridge-name=org.opendaylight.yangtools.yang.data.api.schema.tree.spi.ValueNode@32806084, (urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)fail-mode=org.opendaylight.yangtools.yang.data.api.schema.tree.spi.ValueNode@613e3ecb, (urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)bridge-uuid=org.opendaylight.yangtools.yang.data.api.schema.tree.spi.ValueNode@3b3b4f6c, (urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)managed-by=org.opendaylight.yangtools.yang.data.api.schema.tree.spi.ValueNode@7ba4815a, (urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)datapath-type=org.opendaylight.yangtools.yang.data.api.schema.tree.spi.ValueNode@4a914aea}}, (urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)node-id=org.opendaylight.yangtools.yang.data.api.schema.tree.spi.ValueNode@5ac9fd15}}}})) Regards, Stephen |
| Comments |
| Comment by Filip Gregor [ 27/Nov/15 ] |
|
Can you please update and rebuild yangtools, and retry again. thank you |
| Comment by Stephen Kitt [ 27/Nov/15 ] |
|
It's still failing: 2015-11-27 16:35:40,215 | WARN | lt-dispatcher-15 | SimpleShardDataTreeCohort | 178 - org.opendaylight.controller.sal-distributed-datastore - 1.3.0.SNAPSHOT | Store Tx member-1-chn-1-txn-238: Data validation failed for path /(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)network-topology/topology/topology[ {(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)topology-id=ovsdb:1}]/node. If you want to be able to reproduce, clone ovsdb, checkout 1ae9f08, then mvn clean install -DskipTests -DskipIT (the README file has the instructions for setting up the docker-compose image). |
| Comment by Stephen Kitt [ 27/Nov/15 ] |
|
The next logged ERROR (rather than just WARN) is 2015-11-27 16:35:40,216 | ERROR | lt-dispatcher-17 | LocalThreePhaseCommitCohort | 178 - org.opendaylight.controller.sal-distributed-datastore - 1.3.0.SNAPSHOT | Failed to prepare transaction member-1-chn-1-txn-238 on backend at org.opendaylight.controller.cluster.datastore.SimpleShardDataTreeCohort.canCommit(SimpleShardDataTreeCohort.java:62)[178:org.opendaylight.controller.sal-distributed-datastore:1.3.0.SNAPSHOT] |
| Comment by Stephen Kitt [ 01/Dec/15 ] |
|
Is https://git.opendaylight.org/gerrit/30389 supposed to fix this on its own, or does it need to be combined with https://git.opendaylight.org/gerrit/30266 and https://git.opendaylight.org/gerrit/30265 to work? I tried a rebuild of 30389, and I still get the same issue in ovsdb. |
| Comment by Stephen Kitt [ 08/Dec/15 ] |
|
As requested, I just tried with current yangtools (as rebuilt last night); I'm still seeing failures: 2015-12-08 09:57:25,387 | WARN | ult-dispatcher-4 | SimpleShardDataTreeCohort | 178 - org.opendaylight.controller.sal-distr ]=NodeModific ], modificationType=DELETE, childModification={}]}] and Optional.of(Materiali followed by 2015-12-08 09:57:25,389 | ERROR | lt-dispatcher-27 | LocalThreePhaseCommitCohort | 178 - org.opendaylight.controller.sal-distributed-datastore - 1.3.0.SNAPSHOT | Failed to prepare transaction member-1-txn-362 on backend ]=NodeModification [identifier=(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)node[ {(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)node-id=ovsdb://127.0.0.1:6640}], modificationType=DELETE, childModification={}]}] and Optional.of(MaterializedContainerNode{version=org.opendaylight.yangtools.yang.data.api.schema.tree.spi.Version@7f39a732, subtreeVersion=org.opendaylight.yangtools.yang.data.api.schema.tree.spi.Version@4eab30f1, children={}})) I'll try a rebuild of the current yangtools git master. |
| Comment by Stephen Kitt [ 08/Dec/15 ] |
|
The reproducer branch is https://git.opendaylight.org/gerrit/30965 |
| Comment by Stephen Kitt [ 08/Dec/15 ] |
|
Same with current yangtools git master: 2015-12-08 10:17:16,007 | WARN | lt-dispatcher-20 | SimpleShardDataTreeCohort | 178 - org.opendaylight.controller.sal-distributed-datastore - 1.3.0.SNAPSHOT | Unexpected failure in validation phase ]=NodeModification [identifier=(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)node[ {(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)node-id=ovsdb://127.0.0.1:6640}], modificationType=DELETE, childModification={}]}] and Optional.of(MaterializedContainerNode{version=org.opendaylight.yangtools.yang.data.api.schema.tree.spi.Version@7c77d69e, subtreeVersion=org.opendaylight.yangtools.yang.data.api.schema.tree.spi.Version@64c8ba95, children={}})) followed by 2015-12-08 10:17:16,014 | ERROR | lt-dispatcher-29 | LocalThreePhaseCommitCohort | 178 - org.opendaylight.controller.sal-distributed-datastore - 1.3.0.SNAPSHOT | Failed to prepare transaction member-1-chn-1-txn-8 on backend ]=NodeModification [identifier=(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)node[ {(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)node-id=ovsdb://127.0.0.1:6640}], modificationType=DELETE, childModification={}]}] and Optional.of(MaterializedContainerNode{version=org.opendaylight.yangtools.yang.data.api.schema.tree.spi.Version@7c77d69e, subtreeVersion=org.opendaylight.yangtools.yang.data.api.schema.tree.spi.Version@64c8ba95, children={}})) |
| Comment by Anil Vishnoi [ 15/Jan/16 ] |
|
I am also facing the same issue with openflowplugin project as well. |
| Comment by Colin Dixon [ 19/Jan/16 ] |
|
During the MD-SAL call today, Robert said that the problem was understood, but the fix was more complex than hoped and it was likely going to target Beryllium-RC1. |
| Comment by Sam Hague [ 19/Jan/16 ] |
|
(In reply to Colin Dixon from comment #10) Is there a workaround or any way to mitigate the issue? The test in the bug says "under load" but we see the error when there really isn't a load. This causes random failures and our IT tests to fail. |
| Comment by Robert Varga [ 20/Jan/16 ] |
|
The 'under load' means the problem is more likely to happen because of how transactions are sequenced. It is not a race condition in the code, but rather failure to handle all the cases correctly. If we do not solve this week we will disable min/max validation for RC1 and reintroduce only after it is fixed. |
| Comment by Robert Varga [ 25/Jan/16 ] |
| Comment by Tony Tkacik [ 26/Jan/16 ] |
|
Patches were merged to Boron and stable/beryllium. |
| Comment by Stephen Kitt [ 26/Jan/16 ] |
|
Our reproducer no longer trips up with the fix, so this looks good to me. Thanks! |
| Comment by Stephen Kitt [ 08/Feb/16 ] |
|
Unfortunately we appear to be seeing this again in CSIT. https://jenkins.opendaylight.org/releng/job/ovsdb-csit-1node-southbound-only-beryllium/868/console fails in a setup designed to reproduce this (or rather, duplicate |
| Comment by Stephen Kitt [ 09/Feb/16 ] |
|
The CSIT karaf.log doesn't show the relevant exception, so the failure is likely due to something else. (Thanks to Robert for investigating!) |