[CONTROLLER-934] Clustering : Programmatic Cluster configuration Created: 14/Oct/14  Updated: 07/Jan/16  Resolved: 07/Jan/16

Status: Resolved
Project: controller
Component/s: clustering
Affects Version/s: Helium
Fix Version/s: None

Type: Improvement
Reporter: Mark Mozolewski Assignee: Kalaiselvi
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Microsoft Word Bug 2187_design.docx    

 Description   

Cluster members are defined statically in akka.conf. New cluster configurations requires changing this file on each system and restarting the controllers.

Update to provide a way to programmatically configure a cluster including options that are currently part of akka.conf and potentially others we want to support. In general:

  • Controller's address (IP/Hostname)
  • Other cluster member's addresses (IP/Hostname) including Akka "seed nodes"
  • Akka-specific config (ActorSystem config, logging, serializer, etc.)

Provide a REST interface to create/destroy a team?



 Comments   
Comment by Tom Pantelis [ 03/Feb/15 ]

We could use JMX actions and jolokia's REST API. I would think jolokia supports POST for actions.

Comment by Mark Mozolewski [ 04/Feb/15 ]

(In reply to Tom Pantelis from comment #1)
> We could use JMX actions and jolokia's REST API. I would think jolokia
> supports POST for actions.

Is performing all our configuration changes on the Cluster subsystem via JMX (with Jolokia for REST/HTTP access) in line with CONTROLLER-1064 bringing all our clustering configuration under the Config Subsystem?

I'd like to make sure all existing config (like distributed-datastore-provider.yang) and new config for cluster/shard formation, and maybe even Akka are accessed in a consistent way. Thanks.

Comment by Tom Pantelis [ 04/Feb/15 ]

Yeah - we'll have to see if the config system will be sufficient for what we want to do. The config system does support JMX but currently the CDS JMX beans aren't integrated with the config system. The config system has limitations. Also if/how the config system deals with upgrades is not clear.

Comment by Tom Pantelis [ 04/Feb/15 ]

I think regardless of how the JMX beans are registered (via config system or not) I think we could use JMX actions as we'll likely want to apply the config changes to the running system as well as persist it. Either that or use a yang-defined RPC accessed via restconf.

Comment by Moiz Raja [ 18/Aug/15 ]

This bug covers two things,

1. Dynamically adding/removing new controller nodes to a cluster
2. Dynamically adding/removing shard replicas

Basically grow//shrink cluster.

Comment by Kalaiselvi [ 15/Sep/15 ]

Attachment CONTROLLER-934_design.docx has been added with description: Design approach for dynamic addition/removal of controller nodes

Comment by Colin Dixon [ 15/Sep/15 ]

If this involves change the membership of a given shard, that is going to be a very complex change which requires, among other things, storing the membership information and it's modifications as part of the shards themselves.

Comment by Muthukumaran Kothandaraman [ 16/Sep/15 ]

>>> Memebr addition
>>>The leader for each shard replicates it data to the shard replica member created in the new node.

Comment : When we have a large shard to be replicated, the node would be unusable - right ? Ie. when above replication process takes longer time, the new node would not be able to participate in any in-flight or incoming transactions - is my understanding right ? I understand that this is not specific to dynamic addition. Bu it can become more prominent when new "empty" node is added to a shard

>>> The listeners in the cluster group controller nodes listening for changes in the ‘shards’ list informs ShardManager to add shard replicas for its set of local shards

Comment : Since updating ShardManager is a node-local action, it would not be a problem if we use clustered datatree change listeners. It must just be ensured that above listener does not perform any clusterwide-operation again with any other shards CDS operations because they would be executed more than once which may not be desirable

>>> Dynamic removal of controller node from cluster

Comment :
a) When a node is removed, do we honor the replication criterion of the shard-definition or not ? Eg. Shard S1 is configured with 3 replicas on members M1, M2 and M3. Now we bring down M3. Now, if we examine the "shard-status" via RESTCONF , we would see only M1 and M2 as members of S1 - right ?

b) We might have to "warn"/"fail" "administrator" action when an attempt is made to remove all members of a shard.

One possible solution to (b):
We can define something called "min-live-shard-members" parameter and keep that as, say, 2 and when there is a removal attempt when only 2 members of shard are present, we can fail the removal-request with error-message.

>>> JMX mbean API will be exposed to allow dynamic removal of a controller node from the cluster group

Comment : Since the model is very much available via RESTCONF for modifications, would it not be sufficient to just use that or RPC (as indicated by Tom) instead of JMX ?

Would it be better if we have a FSM which depicts various states through which a new node transitions before it becomes available for transactions ?

Comment by Tom Pantelis [ 20/Oct/15 ]

Initial patches:

https://git.opendaylight.org/gerrit/#/c/28018/
https://git.opendaylight.org/gerrit/#/c/27742/
https://git.opendaylight.org/gerrit/#/c/28596/

Comment by Moiz Raja [ 10/Nov/15 ]

https://gist.github.com/moizr/9401c6c7f427ec4bbff4

Generated at Wed Feb 07 19:54:15 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.