[OPNFLWPLUG-318] ConnectionConductorImpl and HandshakeManagerImpl leak threads on exception paths Created: 14/Nov/14  Updated: 27/Sep/21  Resolved: 14/Jan/15

Status: Resolved
Project: OpenFlowPlugin
Component/s: General
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Jim West Assignee: Martin Bobak
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Text File ConnectionConductorImpl.java.patch     Text File HandshakeListener.java.patch     Text File HandshakeManagerImpl.java.patch    
External issue ID: 2394

 Description   

I'm running ODL on the latest stable/helium branches.

I have a single controller and a single switch. The OpenFlow connection between the switch and the controller is OF13. The connection between the switch and controller has been bouncing frequently.

This AM, I found my switch with 571 OFHandshake threads. 571 threads for a SINGLE connection.

I checked by switch and controller. There was one connection between the two, but 571 threads all in a waiting state

Upon code inspection, I believe I've found code paths where ConnectionConductorImpl will leak instances of ThreadPoolLoggingExecutor. Here's what I believe is happening:

1- The switch initiates an OF connection to the controller
This causes the following calls
initChannel method of TcpChannelInitializer
onSwitchConnected method of org.opendaylight.openflowplugin.openflow.md.core.SwitchConnectionHandlerImpl
ConnectionConductorFactory.createConductor(...)
ConnectionConductor connectionConductor = new ConnectionConductorImpl(connectionAdapter);
This CTOR registers itself as the HandshakeListener
...
connectionConductor.init();
This call creates an instance of ThreadPoolLoggingExecutor to be created

If there is an Exception in initChannel, the catch blocks close the channel, but the do not call any methods that will cause the ThreadPoolLoggingExecutor to shutdown

2- The newly create thread from ThreadPoolLoggingExecutor calls the shake() method of ConnectionConductorImpl
In the ConnectionConductorImpl, the calls to
hsPool.shutdown();
hsPool.purge();
only happen when postHandshakeBasic is called
postHandshakeBasic is called by onHandshakeSuccessfull

HOWEVER in HandshakeManager.shake() there are code paths where the 'onHandshakeSuccessfull' method is never called (the catch block around line 116 of HandshakeManager. The exception path DOES close the connection, but never shuts down the ThreadPoolLoggingExecutor

PROPOSAL:

I believe the HandshakeListener needs a second method:
onHandshakeFailure()
The implementors of this interface can then clean up any resources they've allocated.



 Comments   
Comment by Jim West [ 14/Nov/14 ]

Proposed change: Add Method to openflowplugin/src/main/java/org/opendaylight/openflowplugin/openflow/md/core/HandshakeListener.java

to allow implementers to know when a OF handshake has failed.

Comment by Jim West [ 14/Nov/14 ]

Attachment HandshakeListener.java.patch has been added with description: Proposed change: Add Method to openflowplugin/src/main/java/org/opendaylight/openflowplugin/openflow/md/core/HandshakeListener.java

Comment by Jim West [ 14/Nov/14 ]

When an OF handshake exchange fails because of some exception, tell the handshake listener that it should clean up any resources (threads)

Comment by Jim West [ 14/Nov/14 ]

Attachment HandshakeManagerImpl.java.patch has been added with description: Proposed change openflowplugin/src/main/java/org/opendaylight/openflowplugin/openflow/md/core/HandshakeManagerImpl.java

Comment by Jim West [ 14/Nov/14 ]

Add a method to shutdown the hsPool on handshake failures.

Comment by Jim West [ 14/Nov/14 ]

Attachment ConnectionConductorImpl.java.patch has been added with description: proposed change: openflowplugin/src/main/java/org/opendaylight/openflowplugin/openflow/md/core/ConnectionConductorImpl.java

Comment by Jim West [ 14/Nov/14 ]

I've created a patch and I'm testing this on my systems.

Comment by Martin Bobak [ 18/Dec/14 ]

https://git.opendaylight.org/gerrit/#/c/13739/

Comment by Anil Vishnoi [ 06/Jan/15 ]

Hi Jim,

Did you get a chance to test the patch martin pushed?

Thanks
Anil

Comment by Jim West [ 07/Jan/15 ]

I integrated Martin's patch with the patch that I had created. This merged patch appears to be working fine on my test system.

Generated at Wed Feb 07 20:32:09 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.