[CONTROLLER-1005] Clustering: Write Tx commit may fail with an error stating the Tx has already been sealed Created: 06/Nov/14  Updated: 15/Nov/14  Resolved: 15/Nov/14

Status: Resolved
Project: controller
Component/s: mdsal
Affects Version/s: Helium
Fix Version/s: None

Type: Bug
Reporter: Tom Pantelis Assignee: Tom Pantelis
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issue Links:
Blocks
blocks CONTROLLER-1016 Clustering : BGP - Linkstate topology... Verified
External issue ID: 2337

 Description   

While running the DistributeDataStoreIntergrationTest repeatedly in a script, I found an intermittent failure that occurred a couple times after 20-30 successful runs.

The failure occurred each time in testWriteTransactionWithMultipleShards with an IllegalStateException stating that the transaction has already been sealed. This means that a modification operation (put/merge/delete) was done after the transaction was submitted. However the sequence of calls in the test is valid, i.e. it simply creates a write Tx, makes 2 write calls and calls ready.

After debugging, I found a timing bug in the TransactionProxy.TransactionFutureCallback inner class. On each operation, it first checks if a Tx has been created in the related shard yet (via a non-null TransactionContext instance), otherwise an TransactionFutureCallback instance is created, which initiates the shard Tx creation async, and the operation is cached in the TransactionFutureCallback to be executed when Tx create finishes. The operations are cached in a List to preserve order with synchronization. However there is a small timing window where the operation method may see a non-null TransactionContext but the TransactionFutureCallback hasn't finished executing the previous operations. In this case, the operation method will execute the operation directly in the TransactionContext. This can cause an operation to be performed out-of-order. In the test failure, the ready operation was performed before the write.

The solution is to only publish the TransactionContext instance in TransactionFutureCallback after it has finished executing the cached operations.



 Comments   
Comment by Tom Pantelis [ 06/Nov/14 ]

https://git.opendaylight.org/gerrit/#/c/12584/ - Helium

https://git.opendaylight.org/gerrit/#/c/12581/ - master

Generated at Wed Feb 07 19:54:26 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.