Uploaded image for project: 'OpenFlowPlugin'
  1. OpenFlowPlugin
  2. OPNFLWPLUG-823

[Boron] Switch connection sometimes bounces on non-owner node when OWNER is shut down

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Resolution: Done
    • None
    • None
    • General
    • None
    • Operating System: All
      Platform: All

    • 7184

    Description

      In a 3-node cluster we check the entity-ownership API to determine which switch is the owner for some openflow devices. We stop the OWNER controller and observe the connections bounce on one of the non-owner devices.

      In netstat the issue appears as a growing number of failed connections:
      [vagrant@localhost ~]$ netstat -an | grep 6633
      tcp6 0 0 :::6633 :::* LISTEN
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:50811 TIME_WAIT
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:50791 TIME_WAIT
      tcp6 17 0 10.120.22.207:6633 10.120.22.184:50772 CLOSE_WAIT
      tcp6 17 0 10.120.22.207:6633 10.120.22.184:50768 CLOSE_WAIT
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:50813 TIME_WAIT
      tcp6 17 0 10.120.22.207:6633 10.120.22.184:50784 CLOSE_WAIT
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:50785 TIME_WAIT
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:50778 TIME_WAIT
      tcp6 17 0 10.120.22.207:6633 10.120.22.184:50780 CLOSE_WAIT
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:50797 TIME_WAIT
      tcp6 17 0 10.120.22.207:6633 10.120.22.184:50788 CLOSE_WAIT
      tcp6 8 0 10.120.22.207:6633 10.120.22.184:50823 ESTABLISHED
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:50774 TIME_WAIT
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:50806 TIME_WAIT
      tcp6 17 0 10.120.22.207:6633 10.120.22.184:50792 CLOSE_WAIT
      tcp6 271 0 10.120.22.207:6633 10.120.22.184:46788 CLOSE_WAIT
      tcp6 8 0 10.120.22.207:6633 10.120.22.184:46785 ESTABLISHED
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:50786 TIME_WAIT
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:50787 TIME_WAIT
      tcp6 16 0 10.120.22.207:6633 10.120.22.184:46772 ESTABLISHED
      tcp6 17 0 10.120.22.207:6633 10.120.22.184:50808 CLOSE_WAIT
      tcp6 16 0 10.120.22.207:6633 10.120.22.184:46769 ESTABLISHED
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:50799 TIME_WAIT
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:50782 TIME_WAIT
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:50795 TIME_WAIT
      tcp6 8 0 10.120.22.207:6633 10.120.22.184:46766 ESTABLISHED
      tcp6 17 0 10.120.22.207:6633 10.120.22.184:50776 CLOSE_WAIT
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:50777 TIME_WAIT
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:50790 TIME_WAIT
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:50818 TIME_WAIT
      tcp6 17 0 10.120.22.207:6633 10.120.22.184:50824 CLOSE_WAIT
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:50766 TIME_WAIT
      tcp6 17 0 10.120.22.207:6633 10.120.22.184:50820 CLOSE_WAIT
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:50803 TIME_WAIT
      tcp6 271 0 10.120.22.207:6633 10.120.22.184:46770 CLOSE_WAIT
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:46786 ESTABLISHED
      tcp6 533 0 10.120.22.207:6633 10.120.22.184:46783 CLOSE_WAIT
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:50814 TIME_WAIT
      tcp6 7168 0 10.120.22.207:6633 10.120.22.184:46778 ESTABLISHED
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:50793 TIME_WAIT
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:50815 TIME_WAIT
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:46775 ESTABLISHED
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:50801 TIME_WAIT
      tcp6 16 0 10.120.22.207:6633 10.120.22.184:50828 ESTABLISHED
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:50798 TIME_WAIT
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:46781 ESTABLISHED
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:46790 ESTABLISHED
      tcp6 32 0 10.120.22.207:6633 10.120.22.184:50826 ESTABLISHED
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:50819 TIME_WAIT
      tcp6 16 0 10.120.22.207:6633 10.120.22.184:50827 ESTABLISHED
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:50767 TIME_WAIT
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:50822 TIME_WAIT
      tcp6 17 0 10.120.22.207:6633 10.120.22.184:50796 CLOSE_WAIT
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:50810 TIME_WAIT
      tcp6 8 0 10.120.22.207:6633 10.120.22.184:46774 ESTABLISHED
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:50809 TIME_WAIT
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:50807 TIME_WAIT
      tcp6 17 0 10.120.22.207:6633 10.120.22.184:50800 CLOSE_WAIT
      tcp6 17 0 10.120.22.207:6633 10.120.22.184:50816 CLOSE_WAIT
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:46779 ESTABLISHED
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:50821 TIME_WAIT
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:50802 TIME_WAIT
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:50789 TIME_WAIT
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:50794 TIME_WAIT
      tcp6 433 0 10.120.22.207:6633 10.120.22.184:46777 CLOSE_WAIT
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:50805 TIME_WAIT
      tcp6 17 0 10.120.22.207:6633 10.120.22.184:50812 CLOSE_WAIT
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:50817 TIME_WAIT
      tcp6 0 0 10.120.22.207:6633 10.120.22.184:46784 ESTABLISHED
      tcp6 17 0 10.120.22.207:6633 10.120.22.184:50804 CLOSE_WAIT

      In the karaf.log we see the following sequence:
      2016-11-09 00:10:28,235 | INFO | ntLoopGroup-11-1 | ConnectionAdapterImpl | 277 - org.opendaylight.openflowjava.openflow-protocol-impl - 0.8.1.Boron-SR1 | Hello received / branch
      2016-11-09 00:10:28,242 | WARN | ntLoopGroup-11-1 | DeviceManagerImpl | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | Node openflow:102 already connected disconnecting device. Rejecting connection
      2016-11-09 00:10:28,242 | WARN | ntLoopGroup-11-1 | DeviceManagerImpl | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | Node openflow:102 context state not in TERMINATION state.

      Here is full TRACE log for openflowplugin & openflowjava:

      https://gist.github.com/evanzeller/371612c402ef856ad5153ec3ec635817

      In this state traffic relying on existing flows stops working until recovery, this state persists until the offending node is restarted, then the controller is recovered.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            Avishnoi Anil Vishnoi
            evanz Evan Zeller
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: