[CONTROLLER-1513] sal-akka-raft: separate Shard and RaftActor Created: 25/Apr/16 Updated: 25/Jul/23 Resolved: 18/May/17 |
|
| Status: | Resolved |
| Project: | controller |
| Component/s: | clustering |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Robert Varga | Assignee: | Unassigned |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| Issue Links: |
|
||||||||||||
| External issue ID: | 5800 | ||||||||||||
| Description |
|
In our current implementation, RaftActor is an abstract class, which defines callbacks to its subclasses – one of which is the Shard. This separation is not quite aligned with how services are decomposed in Akka and it creates tight coupling between the two components. That coupling is used for three things:
The first need can easily be satisfied by massaging between Shard and RaftActor. The second one is stickier, but as it turns out we want to make Shard persistence asynchronous anyway, hence it can be turned into messaging between parent and child actors. The third one can easily be solved by having Shard as the parent process, which requests handoff and shuts RaftActor (child) down once the handoff is completed. |
| Comments |
| Comment by Robert Varga [ 03/Aug/16 ] |
|
This will also fix a problem with Shard.applyState(): currently Shard.applyState() is called via an internal RaftActor message (ApplyState), which is enqueue on the shared actor. This holds information which is only accurate for as long as Shard remains a leader, but since the message is enqueued on the same mailbox as RaftRPCs are, by the time this message is processed the role may have changed to follower. Separating the actors will have the effect of ApplyState being processed on Shard's queue, hence Shard will be observing the same (leader) state, simply because any state-changing messages will remain queued behind it. |
| Comment by Tom Pantelis [ 18/May/17 ] |
|
State is now applied synchronously to fix a consistency issue with persisted snapshots w.r.t lastAppliedIndex and the actual applied state. Separating the actors would likely re-introduce this issue. Regardless, there's no compelling reason to separate the actors anymore and it would add considerable complexity and risk (and possibly performance degradation). The reason the idea was first put forth was to alleviate follower election timeouts if the Shard leader was busy processing transaction messages. However, this issue was alleviated by having the follower check the reachability of the leader in the akka cluster state on election timeout. Also, we could always add an actor that just does a keep-alive (I had prototyped this prior to going with the reachability check). |