[CONTROLLER-1568] Potential incorrect state in snapshot initiated via Follower.handleAppendEntries Created: 14/Dec/16 Updated: 06/Jan/17 Resolved: 06/Jan/17 |
|
| Status: | Resolved |
| Project: | controller |
| Component/s: | clustering |
| Affects Version/s: | 0.4.0 |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Kangqian Yin | Assignee: | Tom Pantelis |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| External issue ID: | 7362 |
| Description |
|
Follwer.handleAppendEntries works fine with an AppendEntries message without or with one log entry. But it cannot correctly handle an AppendEntries message with more than one log entries. Because it calls applyLogToStateMachine only for last log entry in the AppendEntries message. All other log entries won't be applied to the log state machine. |
| Comments |
| Comment by Tom Pantelis [ 14/Dec/16 ] |
|
I don't follow what you mean by "it calls applyLogToStateMachine only for last log entry in the AppendEntries message". It passes the current leader commit index to applyLogToStateMachine. handleAppendEntries processes each entry in the AppendEntries message and adds each to the log (if there's no duplicates). applyLogToStateMachine is called to determine which ones can be applied to the state based on the leader commit index. This code works - I'm not clear on exactly what you think is wrong with it. |
| Comment by Tom Pantelis [ 14/Dec/16 ] |
|
(In reply to Tom Pantelis from comment #1) Also there's several test cases in LeaderTest that test multiple log entries in an AppendEntries message sent to a Follower. |
| Comment by Kangqian Yin [ 15/Dec/16 ] |
|
Sorry, my understanding of applyLogToStateMachine is wrong. It can apply all logs up to the input index. And my understanding of where the bug is is also wrong. Actual bug in handleAppendEntries when it handles an AppendEntries message with more than one log entries is that the lastAppliedIndex written in the snapshot captured by calling of captureSnapshotIfReady in appendAndPersistCallback may be larger than the lastIndex written in the snapshot. Such relationship violated the RAFT protocol and may cause the problem of missing log entries during shard recovery from the snapshots and journals. The illegal lastAppliedIndex comes from the asynchronous execution of appendAndPersistCallback and applyLogToStateMachine. When applyLogToStateMachine has finished applying all log entries, the appendAndPersistCallback may be still in processing of first or second of the log entries. At that time, the lastAppliedIndex in the actor context has already been set to the last index of the log entries. If appendAndPersistCallback then triggers a snapshot, the lastAppliedIndex written in the snapshot may be the last index of the log entries which is larger than the lastIndex written which is just the index of the log entry input to appendAndPersistCallback. And I guess the toIndex written in the journals by AppendJournalEntries messages may be also wrong. But I've not reproduced such mistakes in my testing. However the mistake of wrong lastAppliedIndex written in snapshot has been easily reproduced many times in my testing with our customized version of ODL controller. The orginal ODL controller will always send an AppendEntries message with only one log entry. So it won't reveal this bug. |
| Comment by Tom Pantelis [ 15/Dec/16 ] |
|
I think what you're saying is similar to what I encountered when I introduced async persistence for leader persist and replicate recently where, on snapshot, the last log entry is neither in the unapplied entries nor in the captured serialized snapshot state. See comment on line 326 in https://git.opendaylight.org/gerrit/#/c/48441/6/opendaylight/md-sal/sal-akka-raft/src/main/java/org/opendaylight/controller/cluster/raft/behaviors/AbstractLeader.java. This can occur when there's a pending ApplyState message in the queue when snapshot capture is initiated resulting in the pending entry included in lastAppliedIndex but not yet applied to the shard state (i.e. data tree). I was able to work around the issue in that case but it sounds like you've reproduced it in the follower which looks plausible looking at the code. The follower code still uses the "synchronous" persistence so the issue has been latent for a long time. If you have a unit test case that reproduces it, please submit a patch for it. I think the most ideal and simplest solution is what I commented in AbstractLeader, i.e. the updating of lastAppliedIndex and actual application to the state should be done synchronously/atomically with respect to each other. I propose we eliminate the ApplyState message and invoke applyState directly from applyLogToStateMachine. |
| Comment by Tom Pantelis [ 15/Dec/16 ] |
|
Submitted a unit test case that reproduces the issue: https://git.opendaylight.org/gerrit/#/c/49404/ Normally this issue doesn't occur b/c the leader advances the commit index after it achieves replication consensus. So normally followers will first receive an AppendEntries with the log entry, which appends it, followed by another AppendEntries with the updated leader commit which applies it to the state. Therefore the first AppendEntries does not update the lastAppliedIndex and, if a snapshot occurs, the captured snapshot state is correct wrt lastAppliedIndex and the log entry is included in the snapshot's unapplied entries. This issue occurs if an AppendEntries log entry is both appended, with a snapshot triggered, and applied to the state which can occur if a follower is lagging or on catch up after a restart. |
| Comment by Kangqian Yin [ 15/Dec/16 ] |
|
Great work! So fast to reproduce this in a unit test! And an action of professional test-driven development! Learn from you, Tom! I still live in manual test age. |
| Comment by Tom Pantelis [ 15/Dec/16 ] |
|
Thanks. I usually try to reproduce bugs in a unit test first. However it was your manual testing that uncovered it in the first place so thanks for the diligent testing. This bug has been latent for quite some time... (In reply to Kangqian Yin from comment #6) |
| Comment by Tom Pantelis [ 24/Dec/16 ] |