[CONTROLLER-1236] Clustering: Log entries may be missing or not applied on persistence recovery after prior snapshot Created: 05/Apr/15  Updated: 23/Apr/15  Resolved: 23/Apr/15

Status: Resolved
Project: controller
Component/s: mdsal
Affects Version/s: Post-Helium
Fix Version/s: None

Type: Bug
Reporter: Tom Pantelis Assignee: Tom Pantelis
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 2948
Priority: High

 Description   

I created 60K entries in a yang list which triggered 3 snapshots. However after restarting, I only saw 59945 entries, 55 were missing. The last snapshot contained the last 55 unapplied log entries. They were present in the in-memory journal log but weren't applied to the state. There should have been 55 ApplyJournalEntries messages recovered but none were.

The problem is that the ApplyJournalEntries were deleted from the persisted journal log when we saved the snapshot. When we initiate a snapshot, the unapplied log entries are captured but the snapshot data is captured asynchronously thus subsequent ReplicatedLogEntry and ApplyJournalEntries messages could've been saved to persistence by the time we save the snapshot. On save snapshot success, we delete messages up to the sequence number provided by akka in the SaveSnapshotSuccess message. However this is the sequence number obtained at the time the snapshot is saved and thus will include any messages saved after the snapshot was captured.

Solution:

We can obtain the last sequence number from akka and record it when we capture the snapshot. On save snapshot success, use the recorded sequence number when deleting messages.



 Comments   
Comment by Tom Pantelis [ 08/Apr/15 ]

Submitted patches:

https://git.opendaylight.org/gerrit/#/c/17795
https://git.opendaylight.org/gerrit/#/c/17804
https://git.opendaylight.org/gerrit/#/c/17796
https://git.opendaylight.org/gerrit/#/c/17933

Generated at Wed Feb 07 19:55:01 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.