[CONTROLLER-307] Lots of errors/exceptions on controller shutdown Created: 10/Apr/14 Updated: 19/Oct/17 Resolved: 05/May/15 |
|
| Status: | Resolved |
| Project: | controller |
| Component/s: | adsal |
| Affects Version/s: | Helium |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Tom Pantelis | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: Linux |
||
| Issue Links: |
|
||||||||
| External issue ID: | 716 | ||||||||
| Description |
|
When you shutdown all the bundles via the equinox console ('shutdown' command), there are many logged errors and exceptions that occur. While most of these are likely benign as the entire app is shutting down anyway, we should strive for clean shutdown. Many of the errors emanate from the config subsystem initiated by the ModuleFactoryBundleTracker on bundle shutdown. ModuleFactoryBundleTracker.removedBundle invokes the BlankTransactionServiceTracker to perform a commit. I assume this is done to cleanup the modules/services from the cache. There is a comment that states "workaround for service tracker not getting removed service". I assume this is referring to the ServiceTracker for ModuleFactory services which also uses the BlankTransactionServiceTracker. It seems the intent was for the BlankTransactionServiceTracker.removedService method to be invoked to do the commit but it doesn't because the addingService method returns null. The ServiceTracker won't actually track a service and subsequently call removedService/modifiedService unless a non-null instance is returned from addingService. So addingService should extract and return the service instance from the ServiceReference. This obviates the need for the ModuleFactoryBundleTracker to do the work. Either way, the BlankTransactionServiceTracker commit fails with an IllegalStateException from ServiceReferenceRegistryImpl.saveServiceReference when copying the previous mBeans entries from the readable ServiceReferenceRegistryImpl to a new writable ServiceReferenceRegistryImpl instance. This is initiated by ConfigRegistryImpl.beginConfig. The IllegalStateException is thrown from here: Set<String> serviceInterfaceQNames = factoryNamesToQNames.get( The factoryNamesToQNames map is dynamically populated with the current ModuleFactory services from the BundleContext at the start of the transaction. However since this code is being invoked due to a ModuleFactory service that was just removed from the OSGi registry, that service won't be found. Since the service no longer exists, it can't be populated to the new writable ServiceReferenceRegistryImpl so it seems reasonable to ignore it and not throw an ex and fail the entire transaction. This method is called from the copy method which passes 'true' for skipChecks so it also seems reasonable to ignore the lookup failure and return based on skipChecks: if (serviceInterfaceQNames == null) { throw new IllegalStateException(...); After getting by that error, validation errors occurred from Modules during the commit due to service dependencies that no longer existed which failed the entire transaction. This is a tricky one. It seems to me that if a service is removed, it should invalidate other services that depend on it. These services should be properly removed from the cache and cleaned up rather than failing the transaction and leaving them. So in the copyExistingModulesAndProcessFactoryDiff phase of beginConfig, I prototyped changes to validate the previously cached Module prior to copying it to the new registry. If the validation fails then call destroyModule to clean it up (although it first had to be added to the DependencyResolver for this to work). I'm not sure what ramifications this might have, especially when a bundle is simply stopped then restarted. I'm new to ODL and I'm not familiar with the config subsystem. I don't know if it is intended to handle individual bundle restarts. If not and if the purpose of the BlankTransactionServiceTracker handling removed services is to cleanup the config cache, then we can simply do nothing when a service is removed. There were also some other miscellaneous errors on shutdown:
|
| Comments |
| Comment by Tom Pantelis [ 12/Apr/14 ] |
|
After stepping through the startup/wiring of the toaster sample, I have a better understanding of how the ConfigRegistryImpl is used. The config xml file in configurations/initial is pushed by the ConfogPersistorActivator to the ConfigRegistryImpl to create and advertise the toaster module's services. So it seems the ConfigRegistryImpl is the running config. Given that, I don't understand what the purpose is for using a "blank" transaction on ModuleFactory bundle shutdown or module service removal. It previously failed. With my prototyped changes, it succeeds but a bundle restart doesn't work correctly. It seems that bundle restarts wasn't designed/factored into the initial implementation. I would think we'd want to issue a "delete" transaction to cleanup the cache on ModuleFactory bundle shutdown. So my proposal is to comment out the call to the BlankTransactionServiceTracker on shutdown. |
| Comment by Tom Pantelis [ 14/Apr/14 ] |
|
Changes submitted with: |
| Comment by Tomas Olvecky [ 15/Apr/14 ] |
|
Thanks for the analysis, for config subsystem it is really tricky: it assumes that config-manager is stopped before any bundle that is currently instantiated by c-s. Otherwise the system will go into state that is not defined: each module can decide not to allow next transaction until its dependencies are met. I see two possible solutions: 1, investigate if OSGi has notion of something like stop levels - and force config-manager to be stopped before bundles managed by it. 2, use custom shutdown mechanism instead that would allow such operation. There is already an netconf RPC handler (config/shutdown-impl) that right now just stops system bundle but could be extended with this functionality. |
| Comment by Tom Pantelis [ 15/Apr/14 ] |
|
(In reply to Tomas Olvecky from comment #3) Thanks for your comment. Equinox does allow you to define start levels for bundles. Bundles are shutdown in reverse order. In order for the config-manager bundle to stop first it would have to have a higher start level than any module bundle. I'm not sure if that's desirable. The ShutdownServicImpl in the shutdown-impl bundle first shuts down the config-manager bundle and then the system bundle to shutdown all the rest. I'm not clear on how this service is invoked (thru JMX?) but it appears to be an unconventional (and undocumented?) way to shutdown the app - normally it's done via the OSGi container or of running under a wrapper like karaf. That said, I think it's fragile to assume a certain shutdown order or assme how the app will be shutdown. The controller is intended to basically be a set of bundles to build a product app around with its own packaging (may not use equinox and/or may use karaf which comes with its own scripts to start/stop). It appears ConfigRegistryImpl.close does handle shutdown properly. It aborts current transactions, destroys modules etc. It is invoked by the config-manager bundles's activator. So I'm not clear on why you indicate the config-manager bundle has to stop first and how the system might go into an undefined state otherwise. A walkthrough of the config subsystem would help me to understand it more. As such, issuing the "blank" transaction on shutdown clearly doesn't work so I have removed that to avoid the flood of errors. If more work needs to be done to properly handle shutdown than I could open a new bug. |
| Comment by Tomas Olvecky [ 22/Apr/14 ] |
|
(In reply to Tom Pantelis from comment #4) Thanks, was not sure about it and agree that it is not desirable. > The ShutdownServicImpl in the shutdown-impl bundle first shuts down the The shutdown bundle exposes this functionality via netconf. > That said, I think it's fragile to assume a certain shutdown order or assme Agreed. > It appears ConfigRegistryImpl.close does handle shutdown properly. It aborts Config manager reads ModuleFactories from SR. In order to move consistently from transaction to transaction it needs to find all currently registered factories that have committed beans. > As such, issuing the "blank" transaction on shutdown clearly doesn't work so I have created |
| Comment by Tom Pantelis [ 22/Apr/14 ] |
|
(In reply to Tomas Olvecky from comment #5) Tomas - I assume your -1 on the gerrit is to remove the commenting out of the blank transaction in the config manager code in lieu of Let me know if you agree. |
| Comment by Tom Pantelis [ 22/Apr/14 ] |
|
Tomas - disregard my last comment. I see you already submitted changes with https://git.opendaylight.org/gerrit/#/c/6312/. I assume then I should back out all my changes to the config subsytem, specifically leaving in the call to the blankTransactionService in ModuleFactoryBundleTracker.removeBundle as it was? |
| Comment by Tomas Olvecky [ 23/Apr/14 ] |
|
Yes, please. With my patch applied, there are still some exceptions caused by md-sal Modules not properly closing their resources, but I wanted to keep it out of the first patch. So please remove config-manager modifications, keep non config-subsystem related fixes. Regarding md-sal Module exceptions, please either fix them in your commit or we can create a separate bug/patch for those. (In reply to Tom Pantelis from comment #7) |
| Comment by Carol Sanders [ 05/May/15 ] |
|
This bug is part of the project to Move all ADSAL associated component bugs to ADSAL. |