Uploaded image for project: 'controller'
  1. controller
  2. CONTROLLER-1836

Deadlock scenario with multi-shard transactions

XMLWordPrintable

      The genius project has been running into a deadlock with multi-shard transactions. The log shows the following symptoms:

      2018-05-30T15:08:06,198 | WARN  | opendaylight-cluster-data-shard-dispatcher-88 | ShardDataTree | 240 - org.opendaylight.controller.sal-distributed-datastore - | 1.8.0.SNAPSHOT | member-1-shard-default-config: Current transaction member-1-datastore-config-fe-0-txn-1477-0 has timed out after 19233 ms in state CAN_COMMIT_COMPLETE
      2018-05-30T15:08:06,198 | WARN  | opendaylight-cluster-data-shard-dispatcher-65 | ShardDataTree | 240 - org.opendaylight.controller.sal-distributed-datastore - 1.8.0.SNAPSHOT | member-1-shard-inventory-config: Current transaction member-1-datastore-config-fe-0-txn-1478-0 has timed out after 19234 ms in state READY
      2018-05-30T15:08:06,199 | ERROR | opendaylight-cluster-data-shard-dispatcher-88 | Shard | 232 - org.opendaylight.controller.sal-clustering-commons - | 1.8.0.SNAPSHOT | member-1-shard inventory-config: Cannot canCommit transaction member-1-datastore-config-fe-0-txn-1478-0 - no cohort entry found 
      2018-05-30T15:08:06,199 | ERROR | opendaylight-cluster-data-shard-dispatcher-65 | Shard       | 232  - org.opendaylight.controller.sal-clustering-commons - 1.8.0.SNAPSHOT | member-1-shard-default-config: Cannot commit transaction member-1-datastore-config-fe-0-txn-1477-0 - no cohort entry found
      

      The deadlock occurs if the ready messages are interleaved between 2 or more transactions that access the same shards. In this case, tx1 and tx2 are both writing to the inventory and default shards. tx1 sends the ready message to the default shard first and gets added to pendingTransactions queue before tx2. However the opposite happens for the inventory shard, ie tx2 sends ready and is added to pendingTransactions first. So when tx2 sends CanCommit to the default shard, it's not at the head of pendingTransactions so it's not processed - tx2 can't proceed until tx1 fully completes. tx1 sends CanCommit to the default shard and completes. It then sends CanCommit to the inventory shard but tx2 is at the head of the queue so it can't proceed. So neither tx can make progress until they timeout.

      This issue was originally reported via https://jira.opendaylight.org/browse/GENIUS-166 and was fixed for single-node by https://git.opendaylight.org/gerrit/#/c/72650/ however the same issue can occur for 3 (or multi) node.

      The following outlines a proposed solution that should work for single and multi-node.

      The crux of the problem is that the ShardDataTree doesn't allow 3PC to start for a tx unless it's at the head of the pendingTransactions queue - this is done to honor the order in which the tx's were "readied", specifically to maintain tx chain integrity. The deadlock scenario occurs when 2 tx's access the same shards and, for the second shard in the sequence, the first tx is behind the second tx in the pendingTransactions queue. Therefore I propose we relax this rule for all but the first shard in a tx by introducing the list of sorted participating shard names in the ready messages (if multi-shard) and using that to determine if a tx can be moved ahead of another tx in the queue on the CanCommit request to avoid potential deadlock. If the preceding participating shard names for a preceding pending tx in the READY state, call it tx A, in the queue matches that of the requesting tx, then the requesting tx is allowed to be moved ahead of tx A in the queue so it is processed first to avoid potential deadlock if tx A is behind the requesting tx in the pendingTransactions queue for a preceding shard. If the requesting tx is moved to the head of the queue as a result, then proceed with CanCommit.

      Scenario 1:

      tx1 -> shard A, shard B
      tx2 -> A, B

      Queue for shard A -> tx1, tx2
      B -> tx2, tx1

      • tx2 sends CanCommit to A - tx1 is at the head of the Q so tx2 is not allowed to proceed b/c A is the first shard in the participating shard list.
      • tx1 sends CanCommit to A and is at the head of the Q so proceeds.
      • tx1 sends CanCommit to B - tx2 is at the head of the Q but the preceding shards in tx1's participating shard list [A] matches that of tx2 [A] so tx1 is moved ahead of tx2 and proceeds with CanCommit. Note: previously this resulted in deadlock.
      • tx1 completes 3PC
      • tx2 proceeds on A etc

      Scenario 2:

      tx1 -> A, C
      tx2 -> B, C

      A -> tx1
      B -> tx2
      C -> tx2, tx1

      • tx1 sends CanCommit to A and is at the head of the Q so proceeds.
      • tx2 sends CanCommit to B and is at the head of the Q so proceeds.
      • tx1 sends CanCommit to C - tx2 is at the head of the Q. The preceding shards in tx1's participating shard list [A] do not match that of tx2 [B] so tx1 is not moved and does not proceed with CanCommit. This preserves the ready order.
      • tx2 sends CanCommit to C and is at the head of the Q so proceeds.
      • tx1 proceeds with CanCommit C etc

      Scenario 3:

      tx1 -> A, B
      tx2 -> B, C

      A -> tx1
      B -> tx2, tx1
      C -> tx2

      • tx1 sends CanCommit to A and is at the head of the Q so proceeds.
      • tx1 sends CanCommit to B - tx2 is at the head of the Q. The preceding shards in tx1's participating shard list [A] do not match that of tx2 [] so tx1 is not moved and does not proceed with CanCommit. This preserves the ready order.
      • tx2 sends CanCommit to B and is at the head of the Q so proceeds.
      • tx1 proceeds with CanCommit on B etc

            tpantelis Tom Pantelis
            tpantelis Tom Pantelis
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: