[CONTROLLER-1481] Singleton Cluster is not happened Created: 04/Feb/16  Updated: 19/Oct/17  Resolved: 04/Feb/16

Status: Resolved
Project: controller
Component/s: clustering
Affects Version/s: Beryllium
Fix Version/s: None

Type: Bug
Reporter: ranjithkumar_t Assignee: Unassigned
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: File akka.conf    
External issue ID: 5232

 Description   

Cluster instances are 10.106.138.107, 10.106.138.111, 10.106.138.251

Observation-1 :

In 3 node cluster, If all nodes goes down and then restarted node (10.106.138.107) which node act as a leader when nodes are up. now achieved singleton cluster. the leader will be selected and this node act as singleton cluster

http://10.106.138.107:8181/jolokia/read/akka:type=Cluster

{"timestamp":1452166117,"status":200,"request":

{"mbean":"akka:type=Cluster","type":"read"}

,"value":{"Available":true,"ClusterStatus":"{\n \"self-address\": \"akka.tcp:\/\/opendaylight-cluster-data@10.106.138.107:2550\",\n \"members\": [\n

{\n \"address\": \"akka.tcp:\/\/opendaylight-cluster-data@10.106.138.107:2550\",\n \"status\": \"Up\"\n }

\n ],\n \"unreachable\": [\n \n ]\n}\n","Unreachable":"","Members":"akka.tcp:\/\/opendaylight-cluster-data@10.106.138.107:2550","Leader":"akka.tcp:\/\/opendaylight-cluster-data@10.106.138.107:2550","MemberStatus":"Up","Singleton":true}}

Observation-2 :

All 3 nodes goes down and then restarted any one of the non-leader node (10.106.138.251 and 10.106.138.111), leader will not be elected and not achieved singleton cluster.

http://10.106.138.251:8181/jolokia/read/akka:type=Cluster

{"timestamp":1452182366,"status":200,"request":

{"mbean":"akka:type=Cluster","type":"read"}

,"value":{"Available":false,"ClusterStatus":"

{\n \"self-address\": \"akka.tcp:\/\/opendaylight-cluster-data@10.106.138.251:2550\",\n \"members\": [\n \n ],\n \"unreachable\": [\n \n ]\n}

\n","Unreachable":"","Members":"","Leader":"","MemberStatus":"Removed","Singleton":false}}

http://10.106.138.111:8181/jolokia/read/akka:type=Cluster

{"timestamp":1452165274,"status":200,"request":

{"mbean":"akka:type=Cluster","type":"read"}

,"value":{"Available":false,"ClusterStatus":"

{\n \"self-address\": \"akka.tcp:\/\/opendaylight-cluster-data@10.106.138.111:2550\",\n \"members\": [\n \n ],\n \"unreachable\": [\n \n ]\n}

\n","Unreachable":"","Members":"","Leader":"","MemberStatus":"Removed","Singleton":false}}

Many times I have tried but got above observations only.

Reproduce this issue, following below steps
1) 3-node is up
2) After election one node elected as a leader, then down all 3 nodes
3) Restart any one down node which is not act as a leader.



 Comments   
Comment by ranjithkumar_t [ 04/Feb/16 ]

Attachment akka.conf has been added with description: For your reference i have attached akka.conf file

Comment by Tom Pantelis [ 04/Feb/16 ]

This is due to the special nature of the first seed node in the list: 10.106.138.107. The first seed node must be up in order for the other seed nodes to form a cluster (observation 2). For observation 1, the first seed node will try to contact and join another seed node. If that fails then it will join itself and form a cluster. This is governed by the seed-node-timeout in the cluster section of the akka.conf - akka's default is 5s but we've seen in some environments it needs to be increased to avoid cluster islands.

In Be, we set the default for seed-node-timeout to 12s in the akka.conf that we ship. This alleviated the issue on the systems we had observed it. This was back ported to Li by https://git.opendaylight.org/gerrit/#/c/29347/.

The attached akka.conf doesn't have the seed-node-timeout setting and it also contains the odl-cluster-rpc section which was removed in Be. I don't know what version you're using but you need to update your akka.conf. If 12s isn't enough in your environment then increase it further.

Generated at Wed Feb 07 19:55:39 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.