[CONTROLLER-1041] Clustering : ODL startup occasionally fails Created: 24/Nov/14  Updated: 19/Oct/17  Resolved: 07/Jan/15

Status: Resolved
Project: controller
Component/s: mdsal
Affects Version/s: Helium
Fix Version/s: None

Type: Bug
Reporter: Jozef Behran Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: File karaf.log.xz     File karaf.log.xz     File test_20141203.sh    
External issue ID: 2425
Priority: Low

 Description   

With about 10% probability the ODL startup fails. The jolokia/read/org.opendaylight.controller:Category=ShardManager,name=shard-manager-config,type=DistributedConfigDatastore keeps returning "InstanceNotFound":

curl http://127.0.0.1:8181/jolokia/read/org.opendaylight.controller:Category=ShardManager,name=shard-manager-config,type=DistributedConfigDatastore

{"error_type":"javax.management.InstanceNotFoundException","error":"javax.management.InstanceNotFoundException : org.opendaylight.controller:Category=ShardManager,name=shard-manager-config,type=DistributedConfigDatastore","status":404,"request":

{"mbean":"org.opendaylight.controller:Category=ShardManager,name=shard-manager-config,type=DistributedConfigDatastore","type":"read"}

,"stacktrace":"javax.management.InstanceNotFoundException: org.opendaylight.controller:Category=ShardManager,name=shard-manager-config,type=DistributedConfigDatastore
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(Unknown Source)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBeanInfo(Unknown Source)
at com.sun.jmx.mbeanserver.JmxMBeanServer.getMBeanInfo(Unknown Source)
at org.jolokia.handler.ReadHandler$1.execute(ReadHandler.java:46)
at org.jolokia.handler.ReadHandler$1.execute(ReadHandler.java:41)
at org.jolokia.backend.executor.AbstractMBeanServerExecutor.call(AbstractMBeanServerExecutor.java:90)
at org.jolokia.handler.ReadHandler.getMBeanInfo(ReadHandler.java:232)
at org.jolokia.handler.ReadHandler.getAllAttributesNames(ReadHandler.java:244)
at org.jolokia.handler.ReadHandler.resolveAttributes(ReadHandler.java:216)
at org.jolokia.handler.ReadHandler.fetchAttributes(ReadHandler.java:184)
at org.jolokia.handler.ReadHandler.doHandleRequest(ReadHandler.java:119)
at org.jolokia.handler.ReadHandler.doHandleRequest(ReadHandler.java:37)
at org.jolokia.handler.JsonRequestHandler.handleRequest(JsonRequestHandler.java:160)
at org.jolokia.backend.MBeanServerHandler.dispatchRequest(MBeanServerHandler.java:97)
at org.jolokia.backend.LocalRequestDispatcher.dispatchRequest(LocalRequestDispatcher.java:91)
at org.jolokia.backend.BackendManager.callRequestDispatcher(BackendManager.java:388)
at org.jolokia.backend.BackendManager.handleRequest(BackendManager.java:150)
at org.jolokia.http.HttpRequestHandler.executeRequest(HttpRequestHandler.java:197)
at org.jolokia.http.HttpRequestHandler.handleGetRequest(HttpRequestHandler.java:86)
at org.jolokia.http.AgentServlet$3.handleRequest(AgentServlet.java:298)
at org.jolokia.http.AgentServlet.handle(AgentServlet.java:229)
at org.jolokia.http.AgentServlet.doGet(AgentServlet.java:194)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:575)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:668)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501)
at org.ops4j.pax.web.service.jetty.internal.HttpServiceServletHandler.doHandle(HttpServiceServletHandler.java:69)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
at org.ops4j.pax.web.service.jetty.internal.HttpServiceContext.doHandle(HttpServiceContext.java:240)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at org.ops4j.pax.web.service.jetty.internal.JettyServerHandlerCollection.handle(JettyServerHandlerCollection.java:77)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:370)
at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971)
at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Unknown Source)
"}

When investigating the log the following suspicious exception is the first thing that catched my eyes:

org.opendaylight.controller.config.api.JmxAttributeValidationException: Found module factory does not expose expected service interface. Module name is dom-broker-impl : org.opendaylight.controller.config.yang.md.sal.dom.impl.DomBrokerImplModuleFactory@42b327c2, expected service interface interface org.opendaylight.controller.config.yang.md.sal.dom.DOMDataBrokerServiceInterface, dependent module ON org.opendaylight.controller:instanceName=inmemory-dom-broker,type=Module,moduleFactoryName=dom-broker-impl , attribute JmxAttribute

{'DomAsyncBroker'}

at org.opendaylight.controller.config.manager.impl.dependencyresolver.DependencyResolverImpl.validateDependency(DependencyResolverImpl.java:128)
at org.opendaylight.controller.config.yang.md.sal.binding.impl.AbstractBindingAsyncDataBrokerImplModule.validate(AbstractBindingAsyncDataBrokerImplModule.java:55)
at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)[:1.7.0_67]
at java.lang.reflect.Method.invoke(Unknown Source)[:1.7.0_67]
at org.opendaylight.controller.config.manager.impl.dependencyresolver.DependencyResolverManager$ModuleInvocationHandler.handleInvocation(DependencyResolverManager.java:149)
at com.google.common.reflect.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:70)
at com.sun.proxy.$Proxy72.validate(Unknown Source)
at org.opendaylight.controller.config.manager.impl.ConfigTransactionControllerImpl.validateNoLocks(ConfigTransactionControllerImpl.java:330)
at org.opendaylight.controller.config.manager.impl.ConfigTransactionControllerImpl.validateConfig(ConfigTransactionControllerImpl.java:314)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)[:1.7.0_67]
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)[:1.7.0_67]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)[:1.7.0_67]
at java.lang.reflect.Method.invoke(Unknown Source)[:1.7.0_67]
at sun.reflect.misc.Trampoline.invoke(Unknown Source)[:1.7.0_67]
at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)[:1.7.0_67]
at java.lang.reflect.Method.invoke(Unknown Source)[:1.7.0_67]
at sun.reflect.misc.MethodUtil.invoke(Unknown Source)[:1.7.0_67]
at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(Unknown Source)[:1.7.0_67]
at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(Unknown Source)[:1.7.0_67]
at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(Unknown Source)[:1.7.0_67]
at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(Unknown Source)[:1.7.0_67]
at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(Unknown Source)[:1.7.0_67]
at com.sun.jmx.mbeanserver.PerInterface.invoke(Unknown Source)[:1.7.0_67]
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(Unknown Source)[:1.7.0_67]
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(Unknown Source)[:1.7.0_67]
at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(Unknown Source)[:1.7.0_67]
at com.sun.jmx.mbeanserver.MXBeanProxy$InvokeHandler.invoke(Unknown Source)[:1.7.0_67]
at com.sun.jmx.mbeanserver.MXBeanProxy.invoke(Unknown Source)[:1.7.0_67]
at javax.management.MBeanServerInvocationHandler.invoke(Unknown Source)[:1.7.0_67]
at com.sun.proxy.$Proxy71.validateConfig(Unknown Source)[180:org.opendaylight.controller.config-api:0.3.0.SNAPSHOT]
at org.opendaylight.controller.config.util.ConfigTransactionJMXClient.validateConfig(ConfigTransactionJMXClient.java:124)[181:org.opendaylight.controller.config-util:0.3.0.SNAPSHOT]
at org.opendaylight.controller.netconf.confignetconfconnector.transactions.TransactionProvider.validateTestTransaction(TransactionProvider.java:154)[198:org.opendaylight.controller.config-netconf-connector:0.3.0.SNAPSHOT]
at org.opendaylight.controller.netconf.confignetconfconnector.operations.editconfig.EditConfig.test(EditConfig.java:119)[198:org.opendaylight.controller.config-netconf-connector:0.3.0.SNAPSHOT]
at org.opendaylight.controller.netconf.confignetconfconnector.operations.editconfig.EditConfig.executeTests(EditConfig.java:94)[198:org.opendaylight.controller.config-netconf-connector:0.3.0.SNAPSHOT]
at org.opendaylight.controller.netconf.confignetconfconnector.operations.editconfig.EditConfig.getResponseInternal(EditConfig.java:73)[198:org.opendaylight.controller.config-netconf-connector:0.3.0.SNAPSHOT]
at org.opendaylight.controller.netconf.confignetconfconnector.operations.editconfig.EditConfig.handleWithNoSubsequentOperations(EditConfig.java:304)[198:org.opendaylight.controller.config-netconf-connector:0.3.0.SNAPSHOT]
at org.opendaylight.controller.netconf.util.mapping.AbstractLastNetconfOperation.handle(AbstractLastNetconfOperation.java:33)[197:org.opendaylight.controller.netconf-util:0.3.0.SNAPSHOT]
at org.opendaylight.controller.netconf.util.mapping.AbstractNetconfOperation.handle(AbstractNetconfOperation.java:100)[197:org.opendaylight.controller.netconf-util:0.3.0.SNAPSHOT]
at org.opendaylight.controller.netconf.persist.impl.ConfigPusherImpl.sendRequestGetResponseCheckIsOK(ConfigPusherImpl.java:285)[201:org.opendaylight.controller.config-persister-impl:0.3.0.SNAPSHOT]
at org.opendaylight.controller.netconf.persist.impl.ConfigPusherImpl.pushConfig(ConfigPusherImpl.java:241)[201:org.opendaylight.controller.config-persister-impl:0.3.0.SNAPSHOT]
at org.opendaylight.controller.netconf.persist.impl.ConfigPusherImpl.pushConfigWithConflictingVersionRetries(ConfigPusherImpl.java:133)[201:org.opendaylight.controller.config-persister-impl:0.3.0.SNAPSHOT]
at org.opendaylight.controller.netconf.persist.impl.ConfigPusherImpl.internalPushConfigs(ConfigPusherImpl.java:109)[201:org.opendaylight.controller.config-persister-impl:0.3.0.SNAPSHOT]
at org.opendaylight.controller.netconf.persist.impl.ConfigPusherImpl.process(ConfigPusherImpl.java:76)[201:org.opendaylight.controller.config-persister-impl:0.3.0.SNAPSHOT]
at org.opendaylight.controller.netconf.persist.impl.osgi.ConfigPersisterActivator$InnerCustomizer$1.run(ConfigPersisterActivator.java:175)[201:org.opendaylight.controller.config-persister-impl:0.3.0.SNAPSHOT]
at java.lang.Thread.run(Unknown Source)[:1.7.0_67]

There is another one before it but it is a known thing. There is more suspicious stuff inside, see the attached karaf.log.xz file.



 Comments   
Comment by Jozef Behran [ 24/Nov/14 ]

Attachment karaf.log.xz has been added with description: Log file with suspicious exceptions reported in warnings

Comment by Jozef Behran [ 24/Nov/14 ]

This particular log record looks suspicious:

2014-11-24 03:01:32,081 | WARN | r - ConfigPusher | FeatureConfigPusher | 202 - config-persister-feature-adapter - 0.3.0.SNAPSHOT | Karaf featuresServic
e.listInstalledFeatures() has thrown an exception, retry 0
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextEntry(Unknown Source)[:1.7.0_67]
at java.util.HashMap$KeyIterator.next(Unknown Source)[:1.7.0_67]
at java.util.AbstractCollection.toArray(Unknown Source)[:1.7.0_67]
at org.apache.karaf.features.internal.FeaturesServiceImpl.listInstalledFeatures(FeaturesServiceImpl.java:754)[24:org.apache.karaf.features.core:3.0.1]
at Proxy1b1786f9_6f7c_44ab_b811_972f28d509e4.listInstalledFeatures(Unknown Source)[:]
at org.opendaylight.controller.configpusherfeature.internal.FeatureConfigPusher.isInstalled(FeatureConfigPusher.java:87)[202:config-persister-feature-adapter:
0.3.0.SNAPSHOT]
at org.opendaylight.controller.configpusherfeature.internal.FeatureConfigPusher.pushConfig(FeatureConfigPusher.java:73)[202:config-persister-feature-adapter:0
.3.0.SNAPSHOT]
at org.opendaylight.controller.configpusherfeature.internal.FeatureConfigPusher.pushConfigs(FeatureConfigPusher.java:63)[202:config-persister-feature-adapter:
0.3.0.SNAPSHOT]
at org.opendaylight.controller.configpusherfeature.internal.ConfigPushingRunnable.processFeatureEvent(ConfigPushingRunnable.java:65)[202:config-persister-feat
ure-adapter:0.3.0.SNAPSHOT]
at org.opendaylight.controller.configpusherfeature.internal.ConfigPushingRunnable.run(ConfigPushingRunnable.java:47)[202:config-persister-feature-adapter:0.3.
0.SNAPSHOT]
at java.lang.Thread.run(Unknown Source)[:1.7.0_67]

Comment by Jozef Behran [ 24/Nov/14 ]

Attachment karaf.log.xz has been added with description: Another log file with a different suspicious exception

Comment by Jozef Behran [ 24/Nov/14 ]

Now I realized this was tested on Helium SR1. It is not clear which of the bugfixes made after this version are relevant to this bug.

Comment by Jozef Behran [ 24/Nov/14 ]

Additional observations:

  • The Helium SR1 used to exhibit this bug with about 5% to 10% probability but now the Helium Stable Snapshot exhibits this bug with 100% probability.
  • Additionally this bug was now confirmed to exist on the latest master build (distribution-karaf-0.3.0-20141122.031519-135.tar.gz) as well.
Comment by Jozef Behran [ 24/Nov/14 ]

Further observation revealed that the problem is most likely caused by "installing too many features at the same time". The ODL was seen to recover after about 5 to 10 minutes.

Comment by Vratko Polak [ 26/Nov/14 ]

Clarifications based on my experience:

90% of time, ODL startup does not fail entirely, it is just significantly prolonged (sometimes by ~10 minutes).

Behavior seem to depend on the way how jolokia-osgi bundle is being installed. For example older ODL builds can also exhibit startup delay when installed by new installation scripts.

The new installations scripts contain workarounds for CONTROLLER-1004. When that bug gets fixed, this bug may also go away.

Comment by Moiz Raja [ 02/Dec/14 ]

Vratko to share more information on how to reproduce this problem.

From initial input it seemed like simply installing the features,

odl-restconf
odl-mdsal-clustering

and the jolokia bundle

cause the slowdown. (Possibly because the jolokia url is not responding)

Comment by Vratko Polak [ 03/Dec/14 ]

I have created a small script to detect this bug, compressed and attached.

In a dozen of tries, it never detected this bug to be present on a recent build, neither in master branch nor in helium.

This bug still creates occasional delay when more complicated script is used, but it would take a long time to bisect which clustering config change is causing it.

Comment by Vratko Polak [ 03/Dec/14 ]

Attachment test_20141203.sh has been added with description: tester script

Comment by Moiz Raja [ 06/Jan/15 ]

If this is not occurring anymore I'd like to close this bug

Comment by Jozef Behran [ 07/Jan/15 ]

I did not observe this issue on the recent ODL builds. Tested hundreths of times without a single failure. Marking as "fixed".

Generated at Wed Feb 07 19:54:32 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.