Uploaded image for project: 'controller'
  1. controller
  2. CONTROLLER-1743

New ODL node unable to join existing cluster at runtime

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • None
    • Carbon
    • clustering
    • None
    • Operating System: All
      Platform: All

    • 8885

      In Boron, a new ODL node could join an existing cluster at runtime (i.e. without having to restart the existing cluster and changing its cluster configuration) by:

      • starting the new ODL node with a different role name than any of the nodes in the existing cluster
      • seeding the new node with a node in the existing cluster
      • issuing the rpc restconf/operations/cluster-admin:add-replicas-for-all-shards on the new node

      In testing with Carbon official release (distribution-karaf-0.6.0-Carbon.tar.gz) and with SR2 nexus snapshot (distribution-karaf-0.6.2-20170724.132221-179.tar.gz), it appears that there is a startup issue that fails to add the new ODL node to the existing cluster.

      This is the setup:

      1) an existing 3-node ODL cluster consisting of member-1 (192.168.50.33), member-2 (192.168.50.34), and member-3 (192.168.50.35)

      2) configure the akka.conf for the new ODL node, 192.168.50.36, to use the member name "member-4" and seed it with the current leader of the cluster (or any other node, or all the nodes of the cluster):

      cluster

      { # Remove ".tcp" when using artery. seed-nodes = ["akka.tcp://opendaylight-cluster-data@192.168.50.33:2550"] roles = [ "member-4" ] }

      3) start up member-4 and install the clustering feature (module-shards.conf defaults to "member-1" as the only replica)

      note that member-4 gets the Welcome message from the seed node:

      2017-07-24 17:54:41,162 | INFO | ult-dispatcher-6 | kka://opendaylight-cluster-data) | 179 - com.typesafe.akka.slf4j - 2.4.18 | Cluster Node [akka.tcp://opendaylight-cluster-data@192.168.50.36:2550] - Welcome from [akka.tcp://opendaylight-cluster-data@192.168.50.33:2550]

      but there are errors such as:

      2017-07-24 17:56:27,849 | ERROR | l for user karaf | ServiceRecipe | 15 - org.apache.aries.blueprint.core - 1.6.1 | Error retrieving service from ServiceRecipe[name='.component-6']
      org.osgi.service.blueprint.container.ComponentDefinitionException: Unable to initialize bean cdsNodeManager

      2017-07-24 17:56:27,851 | ERROR | ult-dispatcher-4 | Futures$CombinedFuture | 65 - com.google.guava - 18.0.0 | input future failed.
      org.opendaylight.controller.cluster.sharding.DOMDataTreeShardCreationFailedException: Unable to find the backend shard.Failing..

      2017-07-24 17:56:28,061 | ERROR | l for user karaf | BlueprintContainerImpl | 15 - org.apache.aries.blueprint.core - 1.6.1 | Unable to start blueprint container for bundle org.opendaylight.controller.sal-distributed-datastore/1.5.2.SNAPSHOT
      org.osgi.service.blueprint.container.ComponentDefinitionException: org.osgi.service.blueprint.container.ComponentDefinitionException: Unable to initialize bean cdsNodeManager

      2017-07-24 17:56:41,521 | WARN | saction-33-35'}} | DeadlockMonitor | 127 - org.opendaylight.controller.config-manager - 0.6.2.SNAPSHOT | ModuleIdentifier

      {factoryName='runtime-generated-mapping', instanceName='runtime-mapping-singleton'}

      did not finish after 9904 ms

      4) when the existing cluster is queried, member-4 shows up as unreachable:

      {"request":

      {"mbean":"akka:type=Cluster","type":"read"}

      ,"value":{"Leader":"akka.tcp:\/\/opendaylight-cluster-data@192.168.50.33:2550","Unreachable":"akka.tcp:\/\/opendaylight-cluster-data@192.168.50.36:2550","Singleton":false,"Available":true,"MemberStatus":"Up","ClusterStatus":"{\n \"self-address\": \"akka.tcp:\/\/opendaylight-cluster-data@192.168.50.33:2550\",\n \"members\": [\n

      {\n \"address\": \"akka.tcp:\/\/opendaylight-cluster-data@192.168.50.33:2550\",\n \"status\": \"Up\",\n \"roles\": [\n \"member-1\"\n ]\n }

      ,\n

      {\n \"address\": \"akka.tcp:\/\/opendaylight-cluster-data@192.168.50.34:2550\",\n \"status\": \"Up\",\n \"roles\": [\n \"member-2\"\n ]\n }

      ,\n

      {\n \"address\": \"akka.tcp:\/\/opendaylight-cluster-data@192.168.50.35:2550\",\n \"status\": \"Up\",\n \"roles\": [\n \"member-3\"\n ]\n }

      ,\n

      {\n \"address\": \"akka.tcp:\/\/opendaylight-cluster-data@192.168.50.36:2550\",\n \"status\": \"Up\",\n \"roles\": [\n \"member-4\"\n ]\n }

      \n ],\n \"unreachable\": [\n

      {\n \"node\": \"akka.tcp:\/\/opendaylight-cluster-data@192.168.50.36:2550\",\n \"observed-by\": [\n \"akka.tcp:\/\/opendaylight-cluster-data@192.168.50.33:2550\",\n \"akka.tcp:\/\/opendaylight-cluster-data@192.168.50.34:2550\",\n \"akka.tcp:\/\/opendaylight-cluster-data@192.168.50.35:2550\"\n ]\n }

      \n ]\n}\n","Members":"akka.tcp:\/\/opendaylight-cluster-data@192.168.50.33:2550,akka.tcp:\/\/opendaylight-cluster-data@192.168.50.34:2550,akka.tcp:\/\/opendaylight-cluster-data@192.168.50.35:2550,akka.tcp:\/\/opendaylight-cluster-data@192.168.50.36:2550"},"timestamp":1500933425,"status":200}

      5) also, when trying to add-replicas-for-all-shards on member-4, the following error is returned:

      <head>
      <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
      <title>Error 401 Unauthorized</title>
      </head>
      <body>
      <h2>HTTP ERROR 401</h2>
      <p>Problem accessing /restconf/operations/cluster-admin:add-replicas-for-all-shards. Reason:

      <pre> Unauthorized</pre>
      </p>
      <hr />
      <i>
      <small>Powered by Jetty://</small>

            Unassigned Unassigned
            bertrandlow Bertrand Low
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: