[CONTROLLER-1246] Restarting controller after configuring netconf connector fails Created: 10/Apr/15  Updated: 25/Jul/23  Resolved: 26/May/15

Status: Resolved
Project: controller
Component/s: config
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Tom Pantelis Assignee: Maros Marsalek
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issue Links:
Duplicate
is duplicated by CONTROLLER-1284 Clustered datastore failing with canC... Resolved
External issue ID: 2976
Priority: Highest

 Description   

I configured a netconf connector as outlined in https://wiki.opendaylight.org/view/OpenDaylight_Controller:Config:Examples:Netconf#Spawning_Additional_Netconf_Connectors_While_the_Controller_is_Running. The config was saved to etc/opendaylight/current/controller.currentconfig.xml (along with all the other module configs).

I verified a GET on http://localhost:8181/restconf/config/network-topology:network-topology/topology/topology-netconf/node/controller-config/yang-ext:mount/config:modules returns all the modules correctly including the new netconf connector.

I then restarted the controller but it failed miserably with several exceptions:

Exception in thread "config-pusher" java.lang.IllegalStateException: Failed to send commit for configuration 05-clustering.xml(odl-netconf-connector,odl-mdsal-broker)
at org.opendaylight.controller.netconf.persist.impl.ConfigPusherImpl.sendRequestGetResponseCheckIsOK(ConfigPusherImpl.java:285)
at org.opendaylight.controller.netconf.persist.impl.ConfigPusherImpl.pushConfig(ConfigPusherImpl.java:239)
at org.opendaylight.controller.netconf.persist.impl.ConfigPusherImpl.pushConfigWithConflictingVersionRetries(ConfigPusherImpl.java:128)
at org.opendaylight.controller.netconf.persist.impl.ConfigPusherImpl.internalPushConfigs(ConfigPusherImpl.java:101)
at org.opendaylight.controller.netconf.persist.impl.ConfigPusherImpl.process(ConfigPusherImpl.java:76)
at org.opendaylight.controller.netconf.persist.impl.osgi.ConfigPersisterActivator$InnerCustomizer$1.run(ConfigPersisterActivator.java:181)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.IllegalStateException: Error - getInstance() failed for ModuleIdentifier

{factoryName='sal-netconf-connector', instanceName='new-netconf-device'} in transaction TransactionIdentifier{name='ConfigTransaction-46-48'}
at org.opendaylight.controller.config.manager.impl.ConfigTransactionControllerImpl.secondPhaseCommit(ConfigTransactionControllerImpl.java:405)
at org.opendaylight.controller.config.manager.impl.ConfigRegistryImpl.secondPhaseCommit(ConfigRegistryImpl.java:280)
at org.opendaylight.controller.config.manager.impl.ConfigRegistryImpl.commitConfig(ConfigRegistryImpl.java:227)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
at sun.reflect.GeneratedMethodAccessor36.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:193)
at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:175)
at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:117)
at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:54)
at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
at com.sun.jmx.mbeanserver.MXBeanProxy$InvokeHandler.invoke(MXBeanProxy.java:150)
at com.sun.jmx.mbeanserver.MXBeanProxy.invoke(MXBeanProxy.java:167)
at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:252)
at com.sun.proxy.$Proxy16.commitConfig(Unknown Source)
at org.opendaylight.controller.config.util.ConfigRegistryJMXClient.commitConfig(ConfigRegistryJMXClient.java:102)
at org.opendaylight.controller.netconf.confignetconfconnector.transactions.TransactionProvider.commitTransaction(TransactionProvider.java:138)
at org.opendaylight.controller.netconf.confignetconfconnector.operations.Commit.handleWithNoSubsequentOperations(Commit.java:54)
at org.opendaylight.controller.netconf.util.mapping.AbstractLastNetconfOperation.handle(AbstractLastNetconfOperation.java:33)
at org.opendaylight.controller.netconf.util.mapping.AbstractNetconfOperation.handle(AbstractNetconfOperation.java:100)
at org.opendaylight.controller.netconf.persist.impl.ConfigPusherImpl.sendRequestGetResponseCheckIsOK(ConfigPusherImpl.java:280)
... 6 more
Caused by: java.lang.IllegalStateException: cannot create children while terminating or terminated
at akka.actor.dungeon.Children$class.makeChild(Children.scala:199)
at akka.actor.dungeon.Children$class.attachChild(Children.scala:41)
at akka.actor.ActorCell.attachChild(ActorCell.scala:369)
at akka.actor.ActorSystemImpl.actorOf(ActorSystem.scala:553)
at org.opendaylight.controller.cluster.datastore.DistributedDataStore.<init>(DistributedDataStore.java:80)
at org.opendaylight.controller.cluster.datastore.DistributedDataStoreFactory.createInstance(DistributedDataStoreFactory.java:34)
at org.opendaylight.controller.config.yang.config.distributed_datastore_provider.DistributedOperationalDataStoreProviderModule.createInstance(DistributedOperationalDataStoreProviderModule.java:65)
at org.opendaylight.controller.config.spi.AbstractModule.getInstance(AbstractModule.java:73)
at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.opendaylight.controller.config.manager.impl.dependencyresolver.DependencyResolverManager$ModuleInvocationHandler.handleInvocation(DependencyResolverManager.java:150)
at com.google.common.reflect.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:87)
at com.sun.proxy.$Proxy74.getInstance(Unknown Source)
at org.opendaylight.controller.config.manager.impl.dependencyresolver.DependencyResolverImpl.resolveInstance(DependencyResolverImpl.java:159)
at org.opendaylight.controller.config.yang.config.concurrent_data_broker.AbstractDomConcurrentDataBrokerModule.resolveDependencies(AbstractDomConcurrentDataBrokerModule.java:91)
at org.opendaylight.controller.config.spi.AbstractModule.getInstance(AbstractModule.java:72)
at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.opendaylight.controller.config.manager.impl.dependencyresolver.DependencyResolverManager$ModuleInvocationHandler.handleInvocation(DependencyResolverManager.java:150)
at com.google.common.reflect.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:87)
at com.sun.proxy.$Proxy45.getInstance(Unknown Source)
at org.opendaylight.controller.config.manager.impl.dependencyresolver.DependencyResolverImpl.resolveInstance(DependencyResolverImpl.java:159)
at org.opendaylight.controller.config.yang.md.sal.dom.impl.AbstractDomBrokerImplModule.resolveDependencies(AbstractDomBrokerImplModule.java:70)
at org.opendaylight.controller.config.spi.AbstractModule.getInstance(AbstractModule.java:72)
at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.opendaylight.controller.config.manager.impl.dependencyresolver.DependencyResolverManager$ModuleInvocationHandler.handleInvocation(DependencyResolverManager.java:150)
at com.google.common.reflect.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:87)
at com.sun.proxy.$Proxy45.getInstance(Unknown Source)
at org.opendaylight.controller.config.manager.impl.dependencyresolver.DependencyResolverImpl.resolveInstance(DependencyResolverImpl.java:159)
at org.opendaylight.controller.config.yang.md.sal.connector.netconf.AbstractNetconfConnectorModule.resolveDependencies(AbstractNetconfConnectorModule.java:110)
at org.opendaylight.controller.config.spi.AbstractModule.getInstance(AbstractModule.java:72)
at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.opendaylight.controller.config.manager.impl.dependencyresolver.DependencyResolverManager$ModuleInvocationHandler.handleInvocation(DependencyResolverManager.java:150)
at com.google.common.reflect.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:87)
at com.sun.proxy.$Proxy74.getInstance(Unknown Source)
at org.opendaylight.controller.config.manager.impl.ConfigTransactionControllerImpl.secondPhaseCommit(ConfigTransactionControllerImpl.java:399)
... 36 more


The clustering modules started up originally but then were shutdown by the config system. I assume the reason is somewhere in the exception stack traces.

It's curious that it fails trying to commit the 05-clustering.xml config. It's my understanding that the controller.currentconfig.xml overrides the initial configs in etc/opendaylight/karaf.

Also it's strange that the traces indicate the 05-clustering.xml config failure was caused by a failure in "ModuleIdentifier{factoryName='sal-netconf-connector', instanceName='new-netconf-device'}

". I don't see the causual relationship there.



 Comments   
Comment by Maros Marsalek [ 24/Apr/15 ]

Strange indeed... but it looks like a clash between current.config (yes, its should override the configs) and feature-config-pusher.

This failure occurs every time you try to start a reconfigured controller ? Or was just a one time thing ?

Comment by Tom Pantelis [ 24/Apr/15 ]

I tried it a couple times - happened each time. You should be able to reproduce it.

Comment by Tom Pantelis [ 01/May/15 ]

The feature config pusher and persistent config pusher are completely disconnected and do their own thing on startup.

It looks like the feature config pusher needs to drive it and merge the current config in some way. When a config from a feature is obtained, look for a corresponding config in the current config and replace it to preserve the current behavior. That seems fairly straightforward. The one tricky part I see is how to handle a netconf connector that was pushed via restconf and not via a feature.

The bigger issue is with upgrades. The current config system implementation is not upgradeable. For Li, the core mdsal and clustering wiring has changed and is incompatible with He. So if the netconf-connector was installed and a connector config was pushed via restconf in He, upgrade to Li will break.

The main problem is that it puts every module config in the current xml file even though 1 was added/changed. It would be nice if a restconf push only wrote the config that was added/changed but the way it's currently implemented generically using netconf and JMX, that looks problematic. Netconf is designed to get the entire config, merge the changes and commit.

A solution that comes to mind, at least for the short term, is that if you change wiring that makes it incompatible with previous versions, then you have to create a new config XML file (and bump the yang version) and ship that in the feature XML. As long as the feature config pusher drives the whole thing then the old file in initial would be ignored as well as the old config in the the current XML file. So, eg, 05-clustering.xml would become 06-clustering.xml with a new yang version. Assuming this works, I think that would get us over the upgrade hump from He to Li.

Comment by Tom Pantelis [ 02/May/15 ]

Pushed preliminary draft https://git.opendaylight.org/gerrit/#/c/19477 to to remove the current config push by the config persister and expose the Persister instance from the ConfigPusher interface so the feature config pusher can access it.

Comment by Tom Pantelis [ 02/May/15 ]

Expanding on the idea of creating a new config yang with a new version when incompatible changes are made, we could introduce the concept of an obsoleted capability (or config yang model). At a high level, if a config depends on an obsoleted capability then drop it. The feature config pusher could push the persisted configs first before pushing any feature configs. If a config from a feature was already pushed via persistence, then ignore it so it isn't pushed twice.

Comment by Maros Marsalek [ 07/May/15 ]

Hi Tom,

Could you grant me access to the draft you are mentioning in your comment ?

Thanks
Maros

Comment by Tom Pantelis [ 08/May/15 ]

The underlying cause for the CDS restart failure is outlined in https://bugs.opendaylight.org/show_bug.cgi?id=3156. However this bug is still valid - having both pushers doing the same thing on startup results in each module restarting on startup which can lead to sporadic system instability (also increases startup time).

Comment by Maros Marsalek [ 14/May/15 ]

Hi Tom,

Done some digging:

  • The concurrent push from current and feature pusher should be harmless in case where user only added new netconf connecter. Heres why: They both configure the same modules with the same configuration and the default behavior of modules/config-subsystem is that if the configuration did not change, the module is not recreated, the previous instance is kept. This means that if feature-pusher configured CDS first, the current pusher configuration for CDS was ignored. Basically the only thing from current that would be used is the configuration for new netconf connector. If this mechanism was not present, we might see this issue sooner for different modules. So it looks like the check whether the configuration in CDS is the same was always returning false. This was already fixed by Moiz (https://git.opendaylight.org/gerrit/#/c/19957/) and with this fix the clash between current and feature pusher should be harmless. (I was unable to reproduce the issue with his fix)
  • Moiz had to override the canReuse most likely because the canReuse does not work properly for DTOs .. if there is a DTO for config attributes, the instances of the DTO are not compared for equality properly. So instead returning always true (as Moiz did) we should fix the equals for DTOs because right now it would be impossible to reconfigure CDS.
  • Back to the clash of current vs feature pusher. The clash is potentially harmless only when users change the configuration of modules spawned by default. And thats because it might trigger instantiation and sudden reconfiguration of modules. This again should not be harmless for most modules but could cause issues with modules like CDS where the initialization takes quite some time and sudden reconfiguration might be problematic. So we should address the clash ... with the time for Lithium left I can see 2 options for fixing the clash:
    + In case both feature and current are present, current wins over feature with warnings from feature pusher
    + Current config will be changed so that it stores the current feature list as well and the feature pusher will ignore all features being installed that are mentioned in the current config snapshot. The other features will be scanned for config files and pushed as usual.

What do you say ?

Comment by Tom Pantelis [ 14/May/15 ]

Excellent analysis.

Yes - we changed CDS to always return true from canReuseInstance so it works now. You are correct about the config properties DTO's causing a false inequality in isSame in the generated code. That would be good to address but probably not in time for Li release. Hopefully SR1. Is there a bug for that already?

To preserve current functionality, current needs to win over configs that are also pushed via features.

As you mentioned, we should prevent 2 pushes when a config feature differs from current or it has config DTO's like CDS that cause false inequality. Doing 2 pushes would increase startup time and may cause startup instability as some app modules may not close or restart gracefully (app devs may not test dynamic restarts and I don't think it's a requirement for all apps to be "restartable").

Your proposal looks reasonable - that's essentially what I was thinking.

Making the mechanism automatically upgradable is another issue and not something we can address in time for Li unfortunately. That will take a lot more thought.

Comment by Maros Marsalek [ 15/May/15 ]

Regarding the missing hashCode/equals, I have already proposed a patch:
https://git.opendaylight.org/gerrit/#/q/message:BUG-2976

I pushed it as part of this bug, I didnt open another one.

Comment by Maros Marsalek [ 18/May/15 ]

https://git.opendaylight.org/gerrit/#/c/20656/1

Current config persister now stores the list of features from karaf with the config snapshot. Feature config pusher then checks that list whenever a new feature is installed.

Generated at Wed Feb 07 19:55:03 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.