-
Bug
-
Resolution: Duplicate
-
None
-
Helium
-
None
-
Operating System: All
Platform: All
-
2035
Establish a three node odl-mdsal-clustering cluster, then turn off the NIC on one of the nodes for 15 seconds or so... then turn it back on.
The temporarily lost node will not successfully reconnect until it's controller process is restarted, apparently by design based on the INFO log messages.
I'm not sure why the restart is required, but if this remains necessary then we'll need the node to auto-restart itself if it detects that it's being quarantined for lack of restart.
Especially in lab situations where clustering confidence is first established, switches get shutdown for more then 15 seconds frequently, and our SDN controller's cluster should auto-recover from this, preferably without orchestration hacks to cover for it.
2014-09-22 14:02:42,521 | WARN | lt-dispatcher-17 | Remoting | 234 - com.typesafe.akka.slf4j - 2.3.4 | Tried to associate with unreachable remote address [akka.tcp://opendaylight-cluster-data@192.168.1.26:2550]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.
- duplicates
-
CONTROLLER-1396 Clustering: Node does not rejoin after restart
- Resolved
- is duplicated by
-
CONTROLLER-1102 Clustering : Disable auto downing of a an unreachable node
- Resolved