[CONTROLLER-1836] Deadlock scenario with multi-shard transactions Created: 11/Jun/18 Updated: 27/Jun/18 Resolved: 21/Jun/18 |
|
| Status: | Verified |
| Project: | controller |
| Component/s: | clustering |
| Affects Version/s: | Nitrogen, Oxygen, Fluorine |
| Fix Version/s: | Fluorine, Oxygen SR3 |
| Type: | Bug | Priority: | High |
| Reporter: | Tom Pantelis | Assignee: | Tom Pantelis |
| Resolution: | Done | Votes: | 0 |
| Labels: | csit:3node | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Epic Link: | Clustering Stability | ||||||||
| Description |
|
The genius project has been running into a deadlock with multi-shard transactions. The log shows the following symptoms: 2018-05-30T15:08:06,198 | WARN | opendaylight-cluster-data-shard-dispatcher-88 | ShardDataTree | 240 - org.opendaylight.controller.sal-distributed-datastore - | 1.8.0.SNAPSHOT | member-1-shard-default-config: Current transaction member-1-datastore-config-fe-0-txn-1477-0 has timed out after 19233 ms in state CAN_COMMIT_COMPLETE 2018-05-30T15:08:06,198 | WARN | opendaylight-cluster-data-shard-dispatcher-65 | ShardDataTree | 240 - org.opendaylight.controller.sal-distributed-datastore - 1.8.0.SNAPSHOT | member-1-shard-inventory-config: Current transaction member-1-datastore-config-fe-0-txn-1478-0 has timed out after 19234 ms in state READY 2018-05-30T15:08:06,199 | ERROR | opendaylight-cluster-data-shard-dispatcher-88 | Shard | 232 - org.opendaylight.controller.sal-clustering-commons - | 1.8.0.SNAPSHOT | member-1-shard inventory-config: Cannot canCommit transaction member-1-datastore-config-fe-0-txn-1478-0 - no cohort entry found 2018-05-30T15:08:06,199 | ERROR | opendaylight-cluster-data-shard-dispatcher-65 | Shard | 232 - org.opendaylight.controller.sal-clustering-commons - 1.8.0.SNAPSHOT | member-1-shard-default-config: Cannot commit transaction member-1-datastore-config-fe-0-txn-1477-0 - no cohort entry found The deadlock occurs if the ready messages are interleaved between 2 or more transactions that access the same shards. In this case, tx1 and tx2 are both writing to the inventory and default shards. tx1 sends the ready message to the default shard first and gets added to pendingTransactions queue before tx2. However the opposite happens for the inventory shard, ie tx2 sends ready and is added to pendingTransactions first. So when tx2 sends CanCommit to the default shard, it's not at the head of pendingTransactions so it's not processed - tx2 can't proceed until tx1 fully completes. tx1 sends CanCommit to the default shard and completes. It then sends CanCommit to the inventory shard but tx2 is at the head of the queue so it can't proceed. So neither tx can make progress until they timeout. This issue was originally reported via https://jira.opendaylight.org/browse/GENIUS-166 and was fixed for single-node by https://git.opendaylight.org/gerrit/#/c/72650/ however the same issue can occur for 3 (or multi) node. The following outlines a proposed solution that should work for single and multi-node. The crux of the problem is that the ShardDataTree doesn't allow 3PC to start for a tx unless it's at the head of the pendingTransactions queue - this is done to honor the order in which the tx's were "readied", specifically to maintain tx chain integrity. The deadlock scenario occurs when 2 tx's access the same shards and, for the second shard in the sequence, the first tx is behind the second tx in the pendingTransactions queue. Therefore I propose we relax this rule for all but the first shard in a tx by introducing the list of sorted participating shard names in the ready messages (if multi-shard) and using that to determine if a tx can be moved ahead of another tx in the queue on the CanCommit request to avoid potential deadlock. If the preceding participating shard names for a preceding pending tx in the READY state, call it tx A, in the queue matches that of the requesting tx, then the requesting tx is allowed to be moved ahead of tx A in the queue so it is processed first to avoid potential deadlock if tx A is behind the requesting tx in the pendingTransactions queue for a preceding shard. If the requesting tx is moved to the head of the queue as a result, then proceed with CanCommit. Scenario 1: tx1 -> shard A, shard B Queue for shard A -> tx1, tx2
Scenario 2: tx1 -> A, C A -> tx1
Scenario 3: tx1 -> A, B A -> tx1
|
| Comments |
| Comment by Tom Pantelis [ 12/Jun/18 ] |
| Comment by Michael Vorburger [ 21/Jun/18 ] |
|
What are people's thoughts about (both need to, and required effort for) back-porting this one from Fluorine master to stably/oxygen for Oxygen SR3? shague and skitt is this important for us? k.faseela have anyone wanting to do the back-port? |
| Comment by Tom Pantelis [ 21/Jun/18 ] |
|
cherry-pick failed so it would have to be done manually. |
| Comment by Michael Vorburger [ 26/Jun/18 ] |
|
https://git.opendaylight.org/gerrit/#/c/73454/ proposes to back-port this for Oxygen SR3. |
| Comment by Faseela K [ 27/Jun/18 ] |
|
Not seeing the exception currently. |