[CONTROLLER-1743] New ODL node unable to join existing cluster at runtime Created: 25/Jul/17 Updated: 30/Oct/17 Resolved: 06/Sep/17 |
|
| Status: | Resolved |
| Project: | controller |
| Component/s: | clustering |
| Affects Version/s: | Carbon |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Bertrand Low | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| Attachments: |
|
| External issue ID: | 8885 |
| Description |
|
In Boron, a new ODL node could join an existing cluster at runtime (i.e. without having to restart the existing cluster and changing its cluster configuration) by:
In testing with Carbon official release (distribution-karaf-0.6.0-Carbon.tar.gz) and with SR2 nexus snapshot (distribution-karaf-0.6.2-20170724.132221-179.tar.gz), it appears that there is a startup issue that fails to add the new ODL node to the existing cluster. This is the setup: 1) an existing 3-node ODL cluster consisting of member-1 (192.168.50.33), member-2 (192.168.50.34), and member-3 (192.168.50.35) 2) configure the akka.conf for the new ODL node, 192.168.50.36, to use the member name "member-4" and seed it with the current leader of the cluster (or any other node, or all the nodes of the cluster): cluster { # Remove ".tcp" when using artery. seed-nodes = ["akka.tcp://opendaylight-cluster-data@192.168.50.33:2550"] roles = [ "member-4" ] }3) start up member-4 and install the clustering feature (module-shards.conf defaults to "member-1" as the only replica) note that member-4 gets the Welcome message from the seed node: 2017-07-24 17:54:41,162 | INFO | ult-dispatcher-6 | kka://opendaylight-cluster-data) | 179 - com.typesafe.akka.slf4j - 2.4.18 | Cluster Node [akka.tcp://opendaylight-cluster-data@192.168.50.36:2550] - Welcome from [akka.tcp://opendaylight-cluster-data@192.168.50.33:2550] but there are errors such as: 2017-07-24 17:56:27,849 | ERROR | l for user karaf | ServiceRecipe | 15 - org.apache.aries.blueprint.core - 1.6.1 | Error retrieving service from ServiceRecipe[name='.component-6'] 2017-07-24 17:56:27,851 | ERROR | ult-dispatcher-4 | Futures$CombinedFuture | 65 - com.google.guava - 18.0.0 | input future failed. 2017-07-24 17:56:28,061 | ERROR | l for user karaf | BlueprintContainerImpl | 15 - org.apache.aries.blueprint.core - 1.6.1 | Unable to start blueprint container for bundle org.opendaylight.controller.sal-distributed-datastore/1.5.2.SNAPSHOT 2017-07-24 17:56:41,521 | WARN | saction-33-35'}} | DeadlockMonitor | 127 - org.opendaylight.controller.config-manager - 0.6.2.SNAPSHOT | ModuleIdentifier {factoryName='runtime-generated-mapping', instanceName='runtime-mapping-singleton'}did not finish after 9904 ms 4) when the existing cluster is queried, member-4 shows up as unreachable: {"request": {"mbean":"akka:type=Cluster","type":"read"},"value":{"Leader":"akka.tcp:\/\/opendaylight-cluster-data@192.168.50.33:2550","Unreachable":"akka.tcp:\/\/opendaylight-cluster-data@192.168.50.36:2550","Singleton":false,"Available":true,"MemberStatus":"Up","ClusterStatus":"{\n \"self-address\": \"akka.tcp:\/\/opendaylight-cluster-data@192.168.50.33:2550\",\n \"members\": [\n {\n \"address\": \"akka.tcp:\/\/opendaylight-cluster-data@192.168.50.33:2550\",\n \"status\": \"Up\",\n \"roles\": [\n \"member-1\"\n ]\n },\n {\n \"address\": \"akka.tcp:\/\/opendaylight-cluster-data@192.168.50.34:2550\",\n \"status\": \"Up\",\n \"roles\": [\n \"member-2\"\n ]\n },\n {\n \"address\": \"akka.tcp:\/\/opendaylight-cluster-data@192.168.50.35:2550\",\n \"status\": \"Up\",\n \"roles\": [\n \"member-3\"\n ]\n },\n {\n \"address\": \"akka.tcp:\/\/opendaylight-cluster-data@192.168.50.36:2550\",\n \"status\": \"Up\",\n \"roles\": [\n \"member-4\"\n ]\n }\n ],\n \"unreachable\": [\n {\n \"node\": \"akka.tcp:\/\/opendaylight-cluster-data@192.168.50.36:2550\",\n \"observed-by\": [\n \"akka.tcp:\/\/opendaylight-cluster-data@192.168.50.33:2550\",\n \"akka.tcp:\/\/opendaylight-cluster-data@192.168.50.34:2550\",\n \"akka.tcp:\/\/opendaylight-cluster-data@192.168.50.35:2550\"\n ]\n }\n ]\n}\n","Members":"akka.tcp:\/\/opendaylight-cluster-data@192.168.50.33:2550,akka.tcp:\/\/opendaylight-cluster-data@192.168.50.34:2550,akka.tcp:\/\/opendaylight-cluster-data@192.168.50.35:2550,akka.tcp:\/\/opendaylight-cluster-data@192.168.50.36:2550"},"timestamp":1500933425,"status":200} 5) also, when trying to add-replicas-for-all-shards on member-4, the following error is returned: <head> <pre> Unauthorized</pre> |
| Comments |
| Comment by Bertrand Low [ 25/Jul/17 ] |
|
Attachment carbon-error_karaf.log has been added with description: karaf log of the new node when the features "odl-mdsal-clustering odl-jolokia" are installed |
| Comment by Bertrand Low [ 25/Jul/17 ] |
|
Attachment carbon-error-netvirt-install_karaf.log has been added with description: karaf log of the new node when "odl-netvirt-openstack" is installed |
| Comment by Bertrand Low [ 26/Jul/17 ] |
|
This is actually a duplicate of |
| Comment by Tom Pantelis [ 27/Jul/17 ] |
| Comment by Bertrand Low [ 03/Aug/17 ] |
|
Hi Tom, can we cherry-pick this fix to Carbon as well? thanks. |
| Comment by Tom Pantelis [ 03/Aug/17 ] |
|
(In reply to Bertrand Low from comment #5) stable/carbon - https://git.opendaylight.org/gerrit/#/c/61132/ |
| Comment by Robert Varga [ 03/Aug/17 ] |
|
Let's wait with Carbon until Vratko can confirm what it does to CSIT... |
| Comment by Robert Varga [ 24/Aug/17 ] |
|
Sorry, this is a regression in functionality shipped in a previous release, so it clearly blocks Carbon SR2. |
| Comment by Venkatrangan Govindarajan [ 06/Sep/17 ] |
|
All Patches are merged. The problem is fixed |