[CONTROLLER-1589] Current raft implementation seems to be unstable when dynamically adding peers when new nodes come up. Created: 27/Jan/17  Updated: 25/Jul/23  Resolved: 13/Apr/17

Status: Resolved
Project: controller
Component/s: clustering
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Tomas Cere Assignee: Unassigned
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issue Links:
Blocks
blocks CONTROLLER-1588 Switch prefix shard configuration bac... Resolved
External issue ID: 7696

 Description   

Scenario were trying to do:

We want to have a shard started with replicas on all cluster members(even ones added at any time in the future).

Currently the raft implementation seems to quite fragile when dynamically adding peers, when you send an AddServer message to the actor on MemberUp/Reachable events the followers seem to always ignore it leading to each peer having different set of peers which leads to never ending elections.
The current implementation seems to rely either on having hardcoded replicas in the module conf. If we went similar route like the ClusterAdminService addReplica does this also seems to only update peers on the leader which leads to the neverending elections due to different peers for each actor.



 Comments   
Comment by Robert Varga [ 31/Jan/17 ]

Based on the discussion on Clustering Hackers' call, the configuration update should work as follows:

  • ShardManager creates the local replica
  • ShardManager sends AddServer to the leader
  • The leader will catch up the new follower via a ApplySnapshot
  • It then updates its configuration and replicates to all followers

So sending AddServer to followers should not be necessary.

Comment by Tom Pantelis [ 16/Feb/17 ]

Is there an actual issue here or can we close this?

Comment by Jakub Morvay [ 16/Feb/17 ]

Hi Tom,

I have tried approach mentioned above and I have been able to add new shards. However, I have seen some problems with creating shards in some scenarios, possibly, because of bugs in our shard starting logic.

I will try to fix them and will see if this is an actual issue. If not I will close the bug.

Generated at Wed Feb 07 19:55:55 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.