[CONTROLLER-1867] Cluster: Member fail to start with odl-netconf-clustered-topology feature Created: 17/Oct/18 Updated: 12/Jan/19 Resolved: 07/Nov/18 |
|
| Status: | Resolved |
| Project: | controller |
| Component/s: | blueprint |
| Affects Version/s: | Fluorine, Oxygen SR3 |
| Fix Version/s: | Neon, Fluorine SR1, Oxygen SR4 |
| Type: | Bug | Priority: | High |
| Reporter: | Luis Gomez | Assignee: | Robert Varga |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Description |
|
The issue only happens in oxygen and was hidden because current CSIT test installs all compatible features (all test), however if we run same test with just odl-netconf-clustered-topology feature (only test) we see that a cluster member that has been killed cannot be restarted: In the karaf log of the restarting member (1): we see this ERROR: 2018-10-17T19:33:22,521 | INFO | config-pusher | ConfigPusherImpl | 204 - org.opendaylight.controller.config-persister-impl - 0.8.4.SNAPSHOT | Pushing configuration snapshot 06-clustered-entity-ownership.xml(odl-netconf-connector,odl-netconf-connector)
2018-10-17T19:33:22,627 | ERROR | opendaylight-cluster-data-akka.actor.default-dispatcher-3 | ActorSystemImpl | 50 - com.typesafe.akka.slf4j - 2.5.11 | Uncaught error from thread [opendaylight-cluster-data-akka.actor.default-dispatcher-4]: org/w3c/dom/ElementTraversal, shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[opendaylight-cluster-data]
java.lang.NoClassDefFoundError: org/w3c/dom/ElementTraversal
at java.lang.ClassLoader.defineClass1(Native Method) ~[?:?]
at java.lang.ClassLoader.defineClass(ClassLoader.java:763) ~[?:?]
at org.eclipse.osgi.internal.loader.ModuleClassLoader.defineClass(ModuleClassLoader.java:276) ~[?:?]
at org.eclipse.osgi.internal.loader.classpath.ClasspathManager.defineClass(ClasspathManager.java:655) ~[?:?]
at org.eclipse.osgi.internal.loader.classpath.ClasspathManager.findClassImpl(ClasspathManager.java:578) ~[?:?]
at org.eclipse.osgi.internal.loader.classpath.ClasspathManager.findLocalClassImpl(ClasspathManager.java:538) ~[?:?]
at org.eclipse.osgi.internal.loader.classpath.ClasspathManager.findLocalClass(ClasspathManager.java:525) ~[?:?]
at org.eclipse.osgi.internal.loader.ModuleClassLoader.findLocalClass(ModuleClassLoader.java:328) ~[?:?]
at org.eclipse.osgi.internal.loader.BundleLoader.findLocalClass(BundleLoader.java:368) ~[?:?]
at org.eclipse.osgi.internal.loader.BundleLoader.findClassInternal(BundleLoader.java:446) ~[?:?]
at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:395) ~[?:?]
at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:387) ~[?:?]
at org.eclipse.osgi.internal.loader.ModuleClassLoader.loadClass(ModuleClassLoader.java:150) ~[?:?]
at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:?]
at org.apache.xerces.parsers.AbstractDOMParser.startDocument(Unknown Source) ~[378:wrap_file__tmp_karaf-0.8.4-SNAPSHOT_system_xerces_xercesImpl_2.11.0_xercesImpl-2.11.0.jar:0.0.0]
at org.apache.xerces.impl.dtd.XMLDTDValidator.startDocument(Unknown Source) ~[378:wrap_file__tmp_karaf-0.8.4-SNAPSHOT_system_xerces_xercesImpl_2.11.0_xercesImpl-2.11.0.jar:0.0.0]
at org.apache.xerces.impl.XMLDocumentScannerImpl.startEntity(Unknown Source) ~[378:wrap_file__tmp_karaf-0.8.4-SNAPSHOT_system_xerces_xercesImpl_2.11.0_xercesImpl-2.11.0.jar:0.0.0]
at org.apache.xerces.impl.XMLVersionDetector.startDocumentParsing(Unknown Source) ~[378:wrap_file__tmp_karaf-0.8.4-SNAPSHOT_system_xerces_xercesImpl_2.11.0_xercesImpl-2.11.0.jar:0.0.0]
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) ~[378:wrap_file__tmp_karaf-0.8.4-SNAPSHOT_system_xerces_xercesImpl_2.11.0_xercesImpl-2.11.0.jar:0.0.0]
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) ~[378:wrap_file__tmp_karaf-0.8.4-SNAPSHOT_system_xerces_xercesImpl_2.11.0_xercesImpl-2.11.0.jar:0.0.0]
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) ~[378:wrap_file__tmp_karaf-0.8.4-SNAPSHOT_system_xerces_xercesImpl_2.11.0_xercesImpl-2.11.0.jar:0.0.0]
at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) ~[378:wrap_file__tmp_karaf-0.8.4-SNAPSHOT_system_xerces_xercesImpl_2.11.0_xercesImpl-2.11.0.jar:0.0.0]
at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) ~[378:wrap_file__tmp_karaf-0.8.4-SNAPSHOT_system_xerces_xercesImpl_2.11.0_xercesImpl-2.11.0.jar:0.0.0]
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121) ~[?:?]
at org.opendaylight.controller.blueprint.ext.DataStoreAppConfigDefaultXMLReader.parsePossibleDefaultAppConfigXMLFile(DataStoreAppConfigDefaultXMLReader.java:174) ~[194:org.opendaylight.controller.blueprint:0.8.4.SNAPSHOT]
at org.opendaylight.controller.blueprint.ext.DataStoreAppConfigDefaultXMLReader.createDefaultInstance(DataStoreAppConfigDefaultXMLReader.java:126) ~[194:org.opendaylight.controller.blueprint:0.8.4.SNAPSHOT]
at org.opendaylight.controller.blueprint.ext.DataStoreAppConfigMetadata.createDefaultInstance(DataStoreAppConfigMetadata.java:262) ~[194:org.opendaylight.controller.blueprint:0.8.4.SNAPSHOT]
at org.opendaylight.controller.blueprint.ext.DataStoreAppConfigMetadata.setInitialAppConfig(DataStoreAppConfigMetadata.java:229) ~[194:org.opendaylight.controller.blueprint:0.8.4.SNAPSHOT]
at org.opendaylight.controller.blueprint.ext.DataStoreAppConfigMetadata.access$200(DataStoreAppConfigMetadata.java:56) ~[194:org.opendaylight.controller.blueprint:0.8.4.SNAPSHOT]
at org.opendaylight.controller.blueprint.ext.DataStoreAppConfigMetadata$1.onSuccess(DataStoreAppConfigMetadata.java:172) ~[194:org.opendaylight.controller.blueprint:0.8.4.SNAPSHOT]
at org.opendaylight.controller.blueprint.ext.DataStoreAppConfigMetadata$1.onSuccess(DataStoreAppConfigMetadata.java:165) ~[194:org.opendaylight.controller.blueprint:0.8.4.SNAPSHOT]
at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1393) ~[38:com.google.guava:23.6.1.jre]
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:398) ~[38:com.google.guava:23.6.1.jre]
at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1015) ~[38:com.google.guava:23.6.1.jre]
at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:868) ~[38:com.google.guava:23.6.1.jre]
at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:691) ~[38:com.google.guava:23.6.1.jre]
at com.google.common.util.concurrent.AbstractTransformFuture$TransformFuture.setResult(AbstractTransformFuture.java:237) ~[38:com.google.guava:23.6.1.jre]
at com.google.common.util.concurrent.AbstractTransformFuture.run(AbstractTransformFuture.java:159) ~[38:com.google.guava:23.6.1.jre]
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:398) ~[38:com.google.guava:23.6.1.jre]
at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1015) ~[38:com.google.guava:23.6.1.jre]
at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:868) ~[38:com.google.guava:23.6.1.jre]
at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:691) ~[38:com.google.guava:23.6.1.jre]
at com.google.common.util.concurrent.SettableFuture.set(SettableFuture.java:48) ~[38:com.google.guava:23.6.1.jre]
at org.opendaylight.controller.cluster.datastore.messages.ReadData.processResponse(ReadData.java:40) ~[226:org.opendaylight.controller.sal-distributed-datastore:1.7.4.SNAPSHOT]
at org.opendaylight.controller.cluster.datastore.RemoteTransactionContext$2.onComplete(RemoteTransactionContext.java:250) ~[226:org.opendaylight.controller.sal-distributed-datastore:1.7.4.SNAPSHOT]
at akka.dispatch.OnComplete.internal(Future.scala:261) ~[44:com.typesafe.akka.actor:2.5.11]
at akka.dispatch.OnComplete.internal(Future.scala:258) ~[44:com.typesafe.akka.actor:2.5.11]
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:188) ~[44:com.typesafe.akka.actor:2.5.11]
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:185) ~[44:com.typesafe.akka.actor:2.5.11]
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60) ~[353:org.scala-lang.scala-library:2.12.6.v20180426-183816-VFINAL-d7ae7a8]
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55) ~[44:com.typesafe.akka.actor:2.5.11]
at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:91) ~[44:com.typesafe.akka.actor:2.5.11]
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12) ~[353:org.scala-lang.scala-library:2.12.6.v20180426-183816-VFINAL-d7ae7a8]
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:81) ~[353:org.scala-lang.scala-library:2.12.6.v20180426-183816-VFINAL-d7ae7a8]
at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:91) ~[44:com.typesafe.akka.actor:2.5.11]
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) ~[44:com.typesafe.akka.actor:2.5.11]
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:43) ~[44:com.typesafe.akka.actor:2.5.11]
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [44:com.typesafe.akka.actor:2.5.11]
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [44:com.typesafe.akka.actor:2.5.11]
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [44:com.typesafe.akka.actor:2.5.11]
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [44:com.typesafe.akka.actor:2.5.11]
Caused by: java.lang.ClassNotFoundException: org.w3c.dom.ElementTraversal cannot be found by wrap_file__tmp_karaf-0.8.4-SNAPSHOT_system_xerces_xercesImpl_2.11.0_xercesImpl-2.11.0.jar_0.0.0
at org.eclipse.osgi.internal.loader.BundleLoader.findClassInternal(BundleLoader.java:433) ~[?:?]
at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:395) ~[?:?]
at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:387) ~[?:?]
at org.eclipse.osgi.internal.loader.ModuleClassLoader.loadClass(ModuleClassLoader.java:150) ~[?:?]
at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:?]
... 61 more{comment}
|
| Comments |
| Comment by Robert Varga [ 23/Oct/18 ] |
|
Ugh, this looks very weird. We are executing on akka's dispatch pool, from where we switch to xerces, which attempts to load org.w3c.dom, which should be available from the JRE, I think. This feels like a TCCL issue w.r.t. blueprint. |
| Comment by Robert Varga [ 23/Oct/18 ] |
|
tpantelis any ideas what is going on here? |
| Comment by Tom Pantelis [ 23/Oct/18 ] |
|
Not sure off the top of my head. I'd think org.w3c.dom would be exported from the JRE via the root class loader (I do see it listed in jre.properties in my local karaf install). Is this happening only in Oxygen SR3 and not <= Oxygen SR2? |
| Comment by Richard Kosegi [ 24/Oct/18 ] |
|
We just seen similar problem in fluorine : https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/daexim-csit-3node-clustering-basic-only-fluorine/249/odl_3/odl3_karaf.log.gz
|
| Comment by Robert Varga [ 24/Oct/18 ] |
|
This smells quite a bit. rkosegi ecelgp can we reproduce this with
org.eclipse.osgi/debug/loader = true
in etc/equinox-debug.properties ?
|
| Comment by Robert Varga [ 24/Oct/18 ] |
|
Looking at the xerces bundle, it should be able to resolve org.w3c.dom, as it has a optional import for that package, which seems to be resolved (from root bundle) in my local environment. We need that debug it identify which ModuleClassLoader is being used to resolve the class. |
| Comment by Robert Varga [ 24/Oct/18 ] |
|
Well, it seems it's the correct loader, trying to resolve an imported class: // 3) search the imported packages PackageSource source = findImportedSource(pkgName, null); if (source != null) { if (debug.DEBUG_LOADER) { Debug.println("BundleLoader[" + this + "] loading from import package: " + source); //$NON-NLS-1$ //$NON-NLS-2$ } // 3) found import source terminate search at the source result = source.loadClass(name); if (result != null) return result; throw new ClassNotFoundException(name + " cannot be found by " + this); //$NON-NLS-1$ } We need to know what "source" is. |
| Comment by Robert Varga [ 24/Oct/18 ] |
|
My working theory is that this has something to do with the order in which the bundles are deployed and activated. xercesImpl provides a DocumentBuilderFactory instance, via ServiceLoader, which uses TCCL – hence if xercesImpl.jar is visible via TCCL when UntrustedXML.class is loaded, it may pick it up as the DocumentBuilder. There are multiple ways to fix this:
|
| Comment by Robert Varga [ 24/Oct/18 ] |
|
exificient issue: https://github.com/EXIficient/exificient-grammars/issues/11 |
| Comment by Luis Gomez [ 25/Oct/18 ] |
|
Strangely I do not see the issue in stable/fluorine branch, maybe it is not consistent there. For oxygen is very consistent so would it be possible to cherry-pick https://git.opendaylight.org/gerrit/#/c/77266/ to stable/oxygen? |
| Comment by Robert Varga [ 25/Oct/18 ] |
|
done |
| Comment by Robert Varga [ 25/Oct/18 ] |
|
btw. org/w3c/dom/ElementTraversal is present in xml-apis-1.4.01, but not in Java 8. This is fixes with Java 9+, which includes ElementTraversal. |
| Comment by Luis Gomez [ 25/Oct/18 ] |
|
FYI this patch: https://git.opendaylight.org/gerrit/#/c/77274/ works in Oxygen. BTW, it does fix the problem in question but it may also bring or expose some other issues as this is also seen in the log of a failed test using the patch: 2018-10-26T00:31:40,438 | ERROR | Blueprint Extender: 2 | AbstractDataStore | 226 - org.opendaylight.controller.sal-distributed-datastore - 1.7.4.SNAPSHOT | Shard leaders failed to settle in 90 seconds, giving up In any case I keep my +1 as the original problem was consistent and the new one seems like more sporadic. |
| Comment by Robert Varga [ 07/Nov/18 ] |
|
All active release streams have this fixed. The NETCONF part of the fix for Neon needs some more work, but will complete for Neon. |