[CONTROLLER-1236] Clustering: Log entries may be missing or not applied on persistence recovery after prior snapshot Created: 05/Apr/15 Updated: 23/Apr/15 Resolved: 23/Apr/15 |
|
| Status: | Resolved |
| Project: | controller |
| Component/s: | mdsal |
| Affects Version/s: | Post-Helium |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Tom Pantelis | Assignee: | Tom Pantelis |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| External issue ID: | 2948 |
| Priority: | High |
| Description |
|
I created 60K entries in a yang list which triggered 3 snapshots. However after restarting, I only saw 59945 entries, 55 were missing. The last snapshot contained the last 55 unapplied log entries. They were present in the in-memory journal log but weren't applied to the state. There should have been 55 ApplyJournalEntries messages recovered but none were. The problem is that the ApplyJournalEntries were deleted from the persisted journal log when we saved the snapshot. When we initiate a snapshot, the unapplied log entries are captured but the snapshot data is captured asynchronously thus subsequent ReplicatedLogEntry and ApplyJournalEntries messages could've been saved to persistence by the time we save the snapshot. On save snapshot success, we delete messages up to the sequence number provided by akka in the SaveSnapshotSuccess message. However this is the sequence number obtained at the time the snapshot is saved and thus will include any messages saved after the snapshot was captured. Solution: We can obtain the last sequence number from akka and record it when we capture the snapshot. On save snapshot success, use the recorded sequence number when deleting messages. |
| Comments |
| Comment by Tom Pantelis [ 08/Apr/15 ] |
|
Submitted patches: https://git.opendaylight.org/gerrit/#/c/17795 |