[CONTROLLER-1790] Leader can currently not perform its duties Created: 17/Nov/17 Updated: 23/Aug/18 Resolved: 23/Aug/18 |
|
| Status: | Resolved |
| Project: | controller |
| Component/s: | clustering |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Medium |
| Reporter: | Sam Hague | Assignee: | Unassigned |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | csit:3node | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Epic Link: | Clustering Stability | ||||||||
| Description |
|
Messages below happen as the leader node is taken down and the followers have the messages below. ROBOT MESSAGE: Starting test Take Down ODL1 ROBOT MESSAGE: Killing ODL1 10.29.14.5 2017-11-15 02:53:11,625 | WARN | ult-dispatcher-5 | NettyTransport | 178 - com.typesafe.akka.slf4j - 2.4.18 | Remote connection to null failed with java.net.ConnectException: Connection refused: /10.29.14.5:2550 2017-11-15 02:53:11,626 | WARN | ult-dispatcher-5 | ReliableDeliverySupervisor | 178 - com.typesafe.akka.slf4j - 2.4.18 | Association with remote system [akka.tcp://opendaylight-cluster-data@10.29.14.5:2550] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://opendaylight-cluster-data@10.29.14.5:2550]] Caused by: [Connection refused: /10.29.14.5:2550] 2017-11-15 02:53:16,646 | WARN | lt-dispatcher-38 | NettyTransport | 178 - com.typesafe.akka.slf4j - 2.4.18 | Remote connection to null failed with java.net.ConnectException: Connection refused: /10.29.14.5:2550 2017-11-15 02:53:20,936 | INFO | ult-dispatcher-3 | kka://opendaylight-cluster-data) | 178 - com.typesafe.akka.slf4j - 2.4.18 | Cluster Node [akka.tcp://opendaylight-cluster-data@10.29.15.176:2550] - Leader can currently not perform its duties, reachability status: [akka.tcp://opendaylight-cluster-data@10.29.15.176:2550 -> akka.tcp://opendaylight-cluster-data@10.29.14.5:2550: Unreachable [Unreachable] (1), akka.tcp://opendaylight-cluster-data@10.29.15.237:2550 -> akka.tcp://opendaylight-cluster-data@10.29.14.5:2550: Unreachable [Unreachable] (1)], member status: [akka.tcp://opendaylight-cluster-data@10.29.14.5:2550 Up seen=false, akka.tcp://opendaylight-cluster-data@10.29.15.176:2550 Up seen=true, akka.tcp://opendaylight-cluster-data@10.29.15.237:2550 Up seen=true] |
| Comments |
| Comment by Tom Pantelis [ 17/Nov/17 ] |
|
That's reported from akka when connection is lost to a node. Looks like all normal stuff. |
| Comment by Sam Hague [ 19/Jun/18 ] |
|
Could the way we kill nodes cause any issues? The current method to kill the node is below. ${NODE_KILL_COMMAND} ps axf | grep org.apache.karaf | grep -v grep | awk '{print \"kill -9 \" $1}' | sh
|
| Comment by Tom Pantelis [ 19/Jun/18 ] |
|
That shouldn't matter - that message is normal akka behavior when a node becomes unavailable to the leader for any reason. By "duties", it means it can't let any new node into the cluster until all existing ones are either reachable or specifically declared as "down". That's the way it works - all existing nodes have to be able to reach the new node and agree to allow it in. Do you see anything actionable here or OK to close? |
| Comment by Tom Pantelis [ 19/Jul/18 ] |
|
shague ok to close here? |
| Comment by Jamo Luhrsen [ 14/Aug/18 ] |
|
Wondering if this came from a blown up test because of |
| Comment by Jamo Luhrsen [ 23/Aug/18 ] |
|
We have an example of a 100% passing netvirt CSIT job where will still see this message "Leader can currently not perform its duties" will show up, which as tpantelis notes is an expected message when |