|
In a 3-node cluster we check the entity-ownership API to determine which switch is the owner for some openflow devices. We stop the OWNER controller and observe the connections bounce on one of the non-owner devices.
In netstat the issue appears as a growing number of failed connections:
[vagrant@localhost ~]$ netstat -an | grep 6633
tcp6 0 0 :::6633 :::* LISTEN
tcp6 0 0 10.120.22.207:6633 10.120.22.184:50811 TIME_WAIT
tcp6 0 0 10.120.22.207:6633 10.120.22.184:50791 TIME_WAIT
tcp6 17 0 10.120.22.207:6633 10.120.22.184:50772 CLOSE_WAIT
tcp6 17 0 10.120.22.207:6633 10.120.22.184:50768 CLOSE_WAIT
tcp6 0 0 10.120.22.207:6633 10.120.22.184:50813 TIME_WAIT
tcp6 17 0 10.120.22.207:6633 10.120.22.184:50784 CLOSE_WAIT
tcp6 0 0 10.120.22.207:6633 10.120.22.184:50785 TIME_WAIT
tcp6 0 0 10.120.22.207:6633 10.120.22.184:50778 TIME_WAIT
tcp6 17 0 10.120.22.207:6633 10.120.22.184:50780 CLOSE_WAIT
tcp6 0 0 10.120.22.207:6633 10.120.22.184:50797 TIME_WAIT
tcp6 17 0 10.120.22.207:6633 10.120.22.184:50788 CLOSE_WAIT
tcp6 8 0 10.120.22.207:6633 10.120.22.184:50823 ESTABLISHED
tcp6 0 0 10.120.22.207:6633 10.120.22.184:50774 TIME_WAIT
tcp6 0 0 10.120.22.207:6633 10.120.22.184:50806 TIME_WAIT
tcp6 17 0 10.120.22.207:6633 10.120.22.184:50792 CLOSE_WAIT
tcp6 271 0 10.120.22.207:6633 10.120.22.184:46788 CLOSE_WAIT
tcp6 8 0 10.120.22.207:6633 10.120.22.184:46785 ESTABLISHED
tcp6 0 0 10.120.22.207:6633 10.120.22.184:50786 TIME_WAIT
tcp6 0 0 10.120.22.207:6633 10.120.22.184:50787 TIME_WAIT
tcp6 16 0 10.120.22.207:6633 10.120.22.184:46772 ESTABLISHED
tcp6 17 0 10.120.22.207:6633 10.120.22.184:50808 CLOSE_WAIT
tcp6 16 0 10.120.22.207:6633 10.120.22.184:46769 ESTABLISHED
tcp6 0 0 10.120.22.207:6633 10.120.22.184:50799 TIME_WAIT
tcp6 0 0 10.120.22.207:6633 10.120.22.184:50782 TIME_WAIT
tcp6 0 0 10.120.22.207:6633 10.120.22.184:50795 TIME_WAIT
tcp6 8 0 10.120.22.207:6633 10.120.22.184:46766 ESTABLISHED
tcp6 17 0 10.120.22.207:6633 10.120.22.184:50776 CLOSE_WAIT
tcp6 0 0 10.120.22.207:6633 10.120.22.184:50777 TIME_WAIT
tcp6 0 0 10.120.22.207:6633 10.120.22.184:50790 TIME_WAIT
tcp6 0 0 10.120.22.207:6633 10.120.22.184:50818 TIME_WAIT
tcp6 17 0 10.120.22.207:6633 10.120.22.184:50824 CLOSE_WAIT
tcp6 0 0 10.120.22.207:6633 10.120.22.184:50766 TIME_WAIT
tcp6 17 0 10.120.22.207:6633 10.120.22.184:50820 CLOSE_WAIT
tcp6 0 0 10.120.22.207:6633 10.120.22.184:50803 TIME_WAIT
tcp6 271 0 10.120.22.207:6633 10.120.22.184:46770 CLOSE_WAIT
tcp6 0 0 10.120.22.207:6633 10.120.22.184:46786 ESTABLISHED
tcp6 533 0 10.120.22.207:6633 10.120.22.184:46783 CLOSE_WAIT
tcp6 0 0 10.120.22.207:6633 10.120.22.184:50814 TIME_WAIT
tcp6 7168 0 10.120.22.207:6633 10.120.22.184:46778 ESTABLISHED
tcp6 0 0 10.120.22.207:6633 10.120.22.184:50793 TIME_WAIT
tcp6 0 0 10.120.22.207:6633 10.120.22.184:50815 TIME_WAIT
tcp6 0 0 10.120.22.207:6633 10.120.22.184:46775 ESTABLISHED
tcp6 0 0 10.120.22.207:6633 10.120.22.184:50801 TIME_WAIT
tcp6 16 0 10.120.22.207:6633 10.120.22.184:50828 ESTABLISHED
tcp6 0 0 10.120.22.207:6633 10.120.22.184:50798 TIME_WAIT
tcp6 0 0 10.120.22.207:6633 10.120.22.184:46781 ESTABLISHED
tcp6 0 0 10.120.22.207:6633 10.120.22.184:46790 ESTABLISHED
tcp6 32 0 10.120.22.207:6633 10.120.22.184:50826 ESTABLISHED
tcp6 0 0 10.120.22.207:6633 10.120.22.184:50819 TIME_WAIT
tcp6 16 0 10.120.22.207:6633 10.120.22.184:50827 ESTABLISHED
tcp6 0 0 10.120.22.207:6633 10.120.22.184:50767 TIME_WAIT
tcp6 0 0 10.120.22.207:6633 10.120.22.184:50822 TIME_WAIT
tcp6 17 0 10.120.22.207:6633 10.120.22.184:50796 CLOSE_WAIT
tcp6 0 0 10.120.22.207:6633 10.120.22.184:50810 TIME_WAIT
tcp6 8 0 10.120.22.207:6633 10.120.22.184:46774 ESTABLISHED
tcp6 0 0 10.120.22.207:6633 10.120.22.184:50809 TIME_WAIT
tcp6 0 0 10.120.22.207:6633 10.120.22.184:50807 TIME_WAIT
tcp6 17 0 10.120.22.207:6633 10.120.22.184:50800 CLOSE_WAIT
tcp6 17 0 10.120.22.207:6633 10.120.22.184:50816 CLOSE_WAIT
tcp6 0 0 10.120.22.207:6633 10.120.22.184:46779 ESTABLISHED
tcp6 0 0 10.120.22.207:6633 10.120.22.184:50821 TIME_WAIT
tcp6 0 0 10.120.22.207:6633 10.120.22.184:50802 TIME_WAIT
tcp6 0 0 10.120.22.207:6633 10.120.22.184:50789 TIME_WAIT
tcp6 0 0 10.120.22.207:6633 10.120.22.184:50794 TIME_WAIT
tcp6 433 0 10.120.22.207:6633 10.120.22.184:46777 CLOSE_WAIT
tcp6 0 0 10.120.22.207:6633 10.120.22.184:50805 TIME_WAIT
tcp6 17 0 10.120.22.207:6633 10.120.22.184:50812 CLOSE_WAIT
tcp6 0 0 10.120.22.207:6633 10.120.22.184:50817 TIME_WAIT
tcp6 0 0 10.120.22.207:6633 10.120.22.184:46784 ESTABLISHED
tcp6 17 0 10.120.22.207:6633 10.120.22.184:50804 CLOSE_WAIT
In the karaf.log we see the following sequence:
2016-11-09 00:10:28,235 | INFO | ntLoopGroup-11-1 | ConnectionAdapterImpl | 277 - org.opendaylight.openflowjava.openflow-protocol-impl - 0.8.1.Boron-SR1 | Hello received / branch
2016-11-09 00:10:28,242 | WARN | ntLoopGroup-11-1 | DeviceManagerImpl | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | Node openflow:102 already connected disconnecting device. Rejecting connection
2016-11-09 00:10:28,242 | WARN | ntLoopGroup-11-1 | DeviceManagerImpl | 286 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | Node openflow:102 context state not in TERMINATION state.
Here is full TRACE log for openflowplugin & openflowjava:
https://gist.github.com/evanzeller/371612c402ef856ad5153ec3ec635817
In this state traffic relying on existing flows stops working until recovery, this state persists until the offending node is restarted, then the controller is recovered.
|