Uploaded image for project: 'controller'
  1. controller
  2. CONTROLLER-1440

Clustering: Implement persistence online backup and restore

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Resolution: Done
    • Beryllium
    • None
    • clustering
    • None
    • Operating System: All
      Platform: All

    • 4564
    • Normal

    Description

      We need to be able to do online backup and restore of the persisted datastore without disrupting the shards.

      Proposed solution:

      For backup, provide an RPC that takes the backup file name as input. High-level steps:

      • For each shard, get a snapshot of the local data tree
      • Read the root node from each data tree snapshot and serialize
      • Store the shard name and serialized snapshot for each shard in a list
      • Serialize the list to the file

      For restore, we need to apply the serialized root node for each shard on completion of recovery and before the shard is available for use. We can trigger a restore by looking for a backup file in a well-known directory that would be placed by the user prior to start up. High-level steps:

      • On startup, read the backup file and de-serialize the list of shard snapshots
      • Delete the backup file so it isn't processed on subsequent restarts
      • For each shard that has a snapshot to restore, pass the serialized snapshot on construction.
      • On Shard recovery complete, send an ApplySnapshot message to self. This will persist it and apply to the state.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            tpantelis Tom Pantelis
            tpantelis Tom Pantelis
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: