[CONTROLLER-694] close not called on a BindingAwareProvider when it is reinitialized Created: 15/Aug/14  Updated: 19/Aug/14  Resolved: 19/Aug/14

Status: Resolved
Project: controller
Component/s: mdsal
Affects Version/s: Helium
Fix Version/s: None

Type: Bug
Reporter: Moiz Raja Assignee: Maros Marsalek
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: Mac OS
Platform: PC


External issue ID: 1558
Priority: High

 Description   

In Helium md-sal based clustering is enabled by placing the 05-clustering.xml in the configuration/initial folder. This clustering configuration basically provides a different configuration of the DOMDataBroker which uses the Distributed data store instead of the In-Memory Datastore.

The way the config subsystem seems to be built is that first it loads the DOMDataBroker with the configuration provided in 01-md-sal.xml, then when it encounters the 05-clustering.xml it recreates the DOMDataBroker with the new configuration and then reinitializes all it's dependencies.

If you have a BindingAwareProvider like FlowCapableInventoryProvider or FlowCapableTopologyProvider you will see that your onSessionInitiated will be called twice in this scenario. The close method on the Provider is never called.

In the onSessionInitiated of FlowCapableInventoryProvider and the FlowCapableTopologyProvider we create a thread which processes operations from a blocking queue. This queue is fed by notifications coming from openflowplugin. Because onSessionInitiated gets called twice in this scenario it leads to two instances of the listeners being registered. Both of these instances then receive notifications from the openflowplugin. However due to an exception one of the threads processing the operations dies. This leads to the blocking queue of the instance with the dead thread to ultimately fill up and block every notification thread in the system.

Consequently the system does not work as expected.

This is a critical bug which needs to be addressed ASAP. If we do not then we will not be able to use the configuration merge facility of the config subsystem. This means that enabling clustering will require commenting and uncommenting sections of the 01-md-sal.xml and will lead to fragility.



 Comments   
Comment by Maros Marsalek [ 18/Aug/14 ]

Hi Moiz,

Those two modules: topology/inventory manager do not use config subsystem, just those OSGi providers from MD-SAL. However these providers ignore removedService events.

Implemented fix:

https://git.opendaylight.org/gerrit/#/c/10013/

I tried the 05-clustering config with this fix, and I only saw interrupted exceptions from threads in these 2 modules.

Generated at Wed Feb 07 19:53:39 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.