[CONTROLLER-1440] Clustering: Implement persistence online backup and restore Created: 31/Oct/15  Updated: 15/Nov/15  Resolved: 15/Nov/15

Status: Resolved
Project: controller
Component/s: clustering
Affects Version/s: Beryllium
Fix Version/s: None

Type: Bug
Reporter: Tom Pantelis Assignee: Tom Pantelis
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 4564
Priority: Normal

 Description   

We need to be able to do online backup and restore of the persisted datastore without disrupting the shards.

Proposed solution:

For backup, provide an RPC that takes the backup file name as input. High-level steps:

  • For each shard, get a snapshot of the local data tree
  • Read the root node from each data tree snapshot and serialize
  • Store the shard name and serialized snapshot for each shard in a list
  • Serialize the list to the file

For restore, we need to apply the serialized root node for each shard on completion of recovery and before the shard is available for use. We can trigger a restore by looking for a backup file in a well-known directory that would be placed by the user prior to start up. High-level steps:

  • On startup, read the backup file and de-serialize the list of shard snapshots
  • Delete the backup file so it isn't processed on subsequent restarts
  • For each shard that has a snapshot to restore, pass the serialized snapshot on construction.
  • On Shard recovery complete, send an ApplySnapshot message to self. This will persist it and apply to the state.


 Comments   
Comment by Tom Pantelis [ 31/Oct/15 ]

Actually, just storing the serialized root node isn't going to work. We also need the last log index, term, election term info etc, i.e. everything that's captured and stored in the akka snapshot store. So we need to capture a snapshot for each shard and serialize the Snapshot instance.

Also, once we start persisting shard info in the ShardManager, we'll have to backup and restore that as well.

Comment by Tom Pantelis [ 15/Nov/15 ]

Patches:

https://git.opendaylight.org/gerrit/#/c/29085/
https://git.opendaylight.org/gerrit/#/c/29173/
https://git.opendaylight.org/gerrit/#/c/29215/
https://git.opendaylight.org/gerrit/#/c/29237/
https://git.opendaylight.org/gerrit/#/c/29280/
https://git.opendaylight.org/gerrit/#/c/29243/

Generated at Wed Feb 07 19:55:33 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.