Uploaded image for project: 'controller'
  1. controller
  2. CONTROLLER-1077

Clustering: Race condition may cause missing routes in RPC BucketStore

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Resolution: Done
    • Helium
    • None
    • mdsal
    • None
    • Operating System: All
      Platform: All

    • 2526
    • High

    Description

      The workflow by which the RpcRegistry adds route identifiers into the BucketStore is as follows:

      • The RpcRegistry sends the GetLocalBucket message to the BucketStore and waits on the Future.
      • BucketStore returns GetLocalBucketReply containing its internal local Bucket instance.
      • On Future complete, the RpcRegistry extracts the Bucket instance from the GetLocalBucketReply.
      • RpcRegistry calls getData on the Bucket instance to obtain the RoutingTable. getData returns a copy of its internal RoutingTable.
      • RpcRegistry adds the new RouteIdentifiers into its local RoutingTable copy.
      • RpcRegistry calls setData on the Bucket with the new RoutingTable instance.
      • RpcRegistry sends an UpdateBucket message with the Bucket instance.
      • The BucketStore extracts the Bucket instance from the UpdateBucket message and sets its localBucket member.

      The problem is that the BucketStore exposes its internal localBucket, thus allowing it to be modified externally. The manner in which the RpcRegistry modifies the Bucket instance is not thread safe and introcudes a race condition. This can result in missing RouteIdentifiers as in the following scenario:

      • A GetLocalBucket Future callback is notified on Thread 1.
      • At the same time, another GetLocalBucket Future callback is notified on Thread 2.
      • Both threads call getData on the returned Bucket instance and get a local copy.
      • Both add their new routes into their local RoutingTable.
      • Thread 1 calls setData on the Bucket instance.
      • Thread 2 calls setData on the Bucket instance.
      • Since they both had their own local RoutingTable this results in Thread 2 overwriting the instance set by Thread 1 and thus Thread 1's routes are lost.

      In addition, neither the 'data' member nor "version" member in Bucket are volatile so the writes may not be visible to other threads. Also setData should atomically set 'data' and 'version' for consistency.

      Propose solution:

      The BucketStore should not expose its internal state and allow it to be mutated externally. We should remove the GetLocalBucket message. Instead of getting the current routing table, modifying a local copy, then writing it back, which is problematic, we should just send an AddToLocalBucket message to the BucketStore and have it update its local Bucket. Similar with remove.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            tpantelis Tom Pantelis
            tpantelis Tom Pantelis
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: