Details
-
Bug
-
Status: Resolved
-
Resolution: Done
-
Post-Helium
-
None
-
None
-
Operating System: All
Platform: All
-
2948
-
High
Description
I created 60K entries in a yang list which triggered 3 snapshots. However after restarting, I only saw 59945 entries, 55 were missing. The last snapshot contained the last 55 unapplied log entries. They were present in the in-memory journal log but weren't applied to the state. There should have been 55 ApplyJournalEntries messages recovered but none were.
The problem is that the ApplyJournalEntries were deleted from the persisted journal log when we saved the snapshot. When we initiate a snapshot, the unapplied log entries are captured but the snapshot data is captured asynchronously thus subsequent ReplicatedLogEntry and ApplyJournalEntries messages could've been saved to persistence by the time we save the snapshot. On save snapshot success, we delete messages up to the sequence number provided by akka in the SaveSnapshotSuccess message. However this is the sequence number obtained at the time the snapshot is saved and thus will include any messages saved after the snapshot was captured.
Solution:
We can obtain the last sequence number from akka and record it when we capture the snapshot. On save snapshot success, use the recorded sequence number when deleting messages.