[CONTROLLER-1481] Singleton Cluster is not happened Created: 04/Feb/16 Updated: 19/Oct/17 Resolved: 04/Feb/16 |
|
| Status: | Resolved |
| Project: | controller |
| Component/s: | clustering |
| Affects Version/s: | Beryllium |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | ranjithkumar_t | Assignee: | Unassigned |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| Attachments: |
|
| External issue ID: | 5232 |
| Description |
|
Cluster instances are 10.106.138.107, 10.106.138.111, 10.106.138.251 Observation-1 : In 3 node cluster, If all nodes goes down and then restarted node (10.106.138.107) which node act as a leader when nodes are up. now achieved singleton cluster. the leader will be selected and this node act as singleton cluster http://10.106.138.107:8181/jolokia/read/akka:type=Cluster {"timestamp":1452166117,"status":200,"request": {"mbean":"akka:type=Cluster","type":"read"},"value":{"Available":true,"ClusterStatus":"{\n \"self-address\": \"akka.tcp:\/\/opendaylight-cluster-data@10.106.138.107:2550\",\n \"members\": [\n {\n \"address\": \"akka.tcp:\/\/opendaylight-cluster-data@10.106.138.107:2550\",\n \"status\": \"Up\"\n }\n ],\n \"unreachable\": [\n \n ]\n}\n","Unreachable":"","Members":"akka.tcp:\/\/opendaylight-cluster-data@10.106.138.107:2550","Leader":"akka.tcp:\/\/opendaylight-cluster-data@10.106.138.107:2550","MemberStatus":"Up","Singleton":true}} Observation-2 : All 3 nodes goes down and then restarted any one of the non-leader node (10.106.138.251 and 10.106.138.111), leader will not be elected and not achieved singleton cluster. http://10.106.138.251:8181/jolokia/read/akka:type=Cluster {"timestamp":1452182366,"status":200,"request": {"mbean":"akka:type=Cluster","type":"read"},"value":{"Available":false,"ClusterStatus":" {\n \"self-address\": \"akka.tcp:\/\/opendaylight-cluster-data@10.106.138.251:2550\",\n \"members\": [\n \n ],\n \"unreachable\": [\n \n ]\n}\n","Unreachable":"","Members":"","Leader":"","MemberStatus":"Removed","Singleton":false}} http://10.106.138.111:8181/jolokia/read/akka:type=Cluster {"timestamp":1452165274,"status":200,"request": {"mbean":"akka:type=Cluster","type":"read"},"value":{"Available":false,"ClusterStatus":" {\n \"self-address\": \"akka.tcp:\/\/opendaylight-cluster-data@10.106.138.111:2550\",\n \"members\": [\n \n ],\n \"unreachable\": [\n \n ]\n}\n","Unreachable":"","Members":"","Leader":"","MemberStatus":"Removed","Singleton":false}} Many times I have tried but got above observations only. Reproduce this issue, following below steps |
| Comments |
| Comment by ranjithkumar_t [ 04/Feb/16 ] |
|
Attachment akka.conf has been added with description: For your reference i have attached akka.conf file |
| Comment by Tom Pantelis [ 04/Feb/16 ] |
|
This is due to the special nature of the first seed node in the list: 10.106.138.107. The first seed node must be up in order for the other seed nodes to form a cluster (observation 2). For observation 1, the first seed node will try to contact and join another seed node. If that fails then it will join itself and form a cluster. This is governed by the seed-node-timeout in the cluster section of the akka.conf - akka's default is 5s but we've seen in some environments it needs to be increased to avoid cluster islands. In Be, we set the default for seed-node-timeout to 12s in the akka.conf that we ship. This alleviated the issue on the systems we had observed it. This was back ported to Li by https://git.opendaylight.org/gerrit/#/c/29347/. The attached akka.conf doesn't have the seed-node-timeout setting and it also contains the odl-cluster-rpc section which was removed in Be. I don't know what version you're using but you need to update your akka.conf. If 12s isn't enough in your environment then increase it further. |