Details
-
Bug
-
Status: Resolved
-
Resolution: Done
-
Beryllium
-
None
-
None
-
Operating System: All
Platform: All
-
4564
-
Normal
Description
We need to be able to do online backup and restore of the persisted datastore without disrupting the shards.
Proposed solution:
For backup, provide an RPC that takes the backup file name as input. High-level steps:
- For each shard, get a snapshot of the local data tree
- Read the root node from each data tree snapshot and serialize
- Store the shard name and serialized snapshot for each shard in a list
- Serialize the list to the file
For restore, we need to apply the serialized root node for each shard on completion of recovery and before the shard is available for use. We can trigger a restore by looking for a backup file in a well-known directory that would be placed by the user prior to start up. High-level steps:
- On startup, read the backup file and de-serialize the list of shard snapshots
- Delete the backup file so it isn't processed on subsequent restarts
- For each shard that has a snapshot to restore, pass the serialized snapshot on construction.
- On Shard recovery complete, send an ApplySnapshot message to self. This will persist it and apply to the state.