Uploaded image for project: 'controller'
  1. controller
  2. CONTROLLER-883

Clustering : Network Seg (> seconds ) between cluster nodes requires restart

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • None
    • Helium
    • clustering
    • None
    • Operating System: All
      Platform: All

    • 2035

      Establish a three node odl-mdsal-clustering cluster, then turn off the NIC on one of the nodes for 15 seconds or so... then turn it back on.

      The temporarily lost node will not successfully reconnect until it's controller process is restarted, apparently by design based on the INFO log messages.

      I'm not sure why the restart is required, but if this remains necessary then we'll need the node to auto-restart itself if it detects that it's being quarantined for lack of restart.

      Especially in lab situations where clustering confidence is first established, switches get shutdown for more then 15 seconds frequently, and our SDN controller's cluster should auto-recover from this, preferably without orchestration hacks to cover for it.

      2014-09-22 14:02:42,521 | WARN | lt-dispatcher-17 | Remoting | 234 - com.typesafe.akka.slf4j - 2.3.4 | Tried to associate with unreachable remote address [akka.tcp://opendaylight-cluster-data@192.168.1.26:2550]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.

            Unassigned Unassigned
            jameshall03885@gmail.com James Gregory Hall
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: