[CONTROLLER-1534] Introduce 'dormant' Shard follower state Created: 02/Aug/16  Updated: 25/Jul/23

Status: Confirmed
Project: controller
Component/s: clustering
Affects Version/s: None
Fix Version/s: None

Type: New Feature
Reporter: Robert Varga Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All



 Description   

Our current implementation requires all followers to keep the Shard state in two places: the RaftActor journal and the DataTree. This means that the followers have to process incoming snapshots and payloads. This is an inherent trade-off.

It gives us quicker failover times, as the follower DataTree is able to immediately start processing transactions. It also provides early safety net for catching failures, as payloads which fail to apply to the DataTree are immediately logged. It also provides the ability to serve DataTreeCandidates (although we do not implement it yet).

On the other hand it costs us CPU time (to parse the snapshots and apply them) and heap (to hold the DataTree).

Introduce an optional 'dormant' follower state, which will keep RaftActor state, but will not maintain the DataTree. This needs to be driven by the RaftActor, because the transition from dormant to active follower needs to happen before RaftActor acknowledges becoming the leader.

This will not eliminate occasional spikes, as taking a journal snapshot will stil require a processing spike, where the DataTree is used. Proper resource sizing (journal size) and explicit ranged snapshotting can alleviate this by moving processing to off-peak hours.



 Comments   
Comment by Tom Pantelis [ 22/Aug/16 ]

We need to keep the DataTree in followers in order to trigger DTCL's.

Generated at Wed Feb 07 19:55:47 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.