[CONTROLLER-694] close not called on a BindingAwareProvider when it is reinitialized Created: 15/Aug/14 Updated: 19/Aug/14 Resolved: 19/Aug/14 |
|
| Status: | Resolved |
| Project: | controller |
| Component/s: | mdsal |
| Affects Version/s: | Helium |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Moiz Raja | Assignee: | Maros Marsalek |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: Mac OS |
||
| External issue ID: | 1558 |
| Priority: | High |
| Description |
|
In Helium md-sal based clustering is enabled by placing the 05-clustering.xml in the configuration/initial folder. This clustering configuration basically provides a different configuration of the DOMDataBroker which uses the Distributed data store instead of the In-Memory Datastore. The way the config subsystem seems to be built is that first it loads the DOMDataBroker with the configuration provided in 01-md-sal.xml, then when it encounters the 05-clustering.xml it recreates the DOMDataBroker with the new configuration and then reinitializes all it's dependencies. If you have a BindingAwareProvider like FlowCapableInventoryProvider or FlowCapableTopologyProvider you will see that your onSessionInitiated will be called twice in this scenario. The close method on the Provider is never called. In the onSessionInitiated of FlowCapableInventoryProvider and the FlowCapableTopologyProvider we create a thread which processes operations from a blocking queue. This queue is fed by notifications coming from openflowplugin. Because onSessionInitiated gets called twice in this scenario it leads to two instances of the listeners being registered. Both of these instances then receive notifications from the openflowplugin. However due to an exception one of the threads processing the operations dies. This leads to the blocking queue of the instance with the dead thread to ultimately fill up and block every notification thread in the system. Consequently the system does not work as expected. This is a critical bug which needs to be addressed ASAP. If we do not then we will not be able to use the configuration merge facility of the config subsystem. This means that enabling clustering will require commenting and uncommenting sections of the 01-md-sal.xml and will lead to fragility. |
| Comments |
| Comment by Maros Marsalek [ 18/Aug/14 ] |
|
Hi Moiz, Those two modules: topology/inventory manager do not use config subsystem, just those OSGi providers from MD-SAL. However these providers ignore removedService events. Implemented fix: https://git.opendaylight.org/gerrit/#/c/10013/ I tried the 05-clustering config with this fix, and I only saw interrupted exceptions from threads in these 2 modules. |