[CONTROLLER-1881] DeadTransactionException failure in ELM longevity test Created: 09/Jan/19  Updated: 31/May/20  Resolved: 27/May/20

Status: Resolved
Project: controller
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: High
Reporter: Tom Pantelis Assignee: Jamo Luhrsen
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

This is occurring consistently, eg https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/controller-csit-3node-ddb-expl-lead-movement-longevity-only-neon/21/robot-plugin/log.html.gz

In the iteration that failed, odl2 was the shard leader ${idx_from} and the node picked to generate transactions ${idx_trans}, odl1 was the node to make the new leader ${idx_to}.

The odl-mdsal-lowlevel-control:write-transactions RPC to odl2 failed in WriteTransactionsHandler.start when attempting to write the initial top level nodes, using standalone transactions.

In odl2 karaf log, we see:

2019-01-06T21:32:51,827 | WARN  | opendaylight-cluster-data-shard-dispatcher-27 | AbstractFrontendHistory          | 226 - org.opendaylight.controller.sal-distributed-datastore - 1.9.0.SNAPSHOT | member-2-shard-default-config: Request CommitLocalTransactionRequest{target=member-2-datastore-config-fe-0-txn-1546-0, sequence=0, replyTo=Actor[akka://opendaylight-cluster-data/user/$a#-1325814690], coordinated=false} is contained purged transactions [[0..1546]]

2019-01-06T21:32:51,887 | WARN  | qtp366157713-98  | WriteTransactionsHandler         | 230 - org.opendaylight.controller.samples.clustering-it-provider - 1.9.0.SNAPSHOT | Unable to ensure IdInts list for id: prefix-2 exists.
org.opendaylight.controller.md.sal.common.api.data.TransactionCommitFailedException: canCommit encountered an unexpected failure
        at org.opendaylight.controller.sal.core.compat.LegacyDOMDataBrokerAdapter$1.newWithCause(LegacyDOMDataBrokerAdapter.java:79) ~[?:?]
        at org.opendaylight.controller.sal.core.compat.LegacyDOMDataBrokerAdapter$1.newWithCause(LegacyDOMDataBrokerAdapter.java:68) ~[?:?]
        at org.opendaylight.yangtools.util.concurrent.ExceptionMapper.apply(ExceptionMapper.java:106) ~[287:org.opendaylight.yangtools.util:2.1.6]
        at org.opendaylight.controller.sal.core.compat.LegacyDOMDataBrokerAdapter$DOMDataTransactionAdapter$1.onFailure(LegacyDOMDataBrokerAdapter.java:303) ~[?:?]
...
Caused by: org.opendaylight.controller.cluster.access.commands.DeadTransactionException: Transactions [[0..1546]] have been purged
        at org.opendaylight.controller.cluster.datastore.AbstractFrontendHistory.handleTransactionRequest(AbstractFrontendHistory.java:88) ~[?:?]
        at org.opendaylight.controller.cluster.datastore.LeaderFrontendState.handleTransactionRequest(LeaderFrontendState.java:211) ~[?:?]
        at org.opendaylight.controller.cluster.datastore.Shard.handleRequest(Shard.java:562) ~[?:?]
        at org.opendaylight.controller.cluster.datastore.Shard.handleRequestEnvelope(Shard.java:389) ~[?:?]
        at org.opendaylight.controller.cluster.datastore.Shard.handleNonRaftCommand(Shard.java:316) ~[?:?]


 Comments   
Comment by Jamo Luhrsen [ 27/May/20 ]

This Exception is not seen in the latest runs of this job on magnesium. Since this bug is now nearly 18 months old with zero activity and the specific Exception is not seen in the karaf
logs, I am closing this as "cannot reproduce"

Generated at Wed Feb 07 19:56:41 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.