[ODLPARENT-143] Pax Exam NotBoundException failures Created: 27/Feb/18  Updated: 06/Sep/21  Resolved: 06/Sep/21

Status: Resolved
Project: odlparent
Component/s: SFT
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Medium
Reporter: Michael Vorburger Assignee: Unassigned
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

This technically isn't an "odlparent bug", but I don't know what other JIRA project to put this into, but believe it is useful to have an issue to discuss this problem, note findings, link possible future external issues, and perhaps one fine day in a far far away future upgrade Pax Exam under.

We relatively regularly hit this kind of problem from Pax Exam ITs (rare, just because we actually have very few real IT) or the org.opendaylight.odlparent.featuretest.SingleFeatureTest (AKA the SFT; that hits it reasonably frequently, but just because we have lots of them that run implicitly and automatically for each odl-* feature in all ODL sub-projects) :

java.rmi.NotBoundException: 8c248bd1-bb85-4fbe-a552-f46e7c70ee25
	at sun.rmi.registry.RegistryImpl.lookup(RegistryImpl.java:227)
	at sun.rmi.registry.RegistryImpl_Skel.dispatch(RegistryImpl_Skel.java:115)
	at sun.rmi.server.UnicastServerRef.oldDispatch(UnicastServerRef.java:472)
	at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:299)
	at sun.rmi.transport.Transport$1.run(Transport.java:200)
	at sun.rmi.transport.Transport$1.run(Transport.java:197)
	at java.security.AccessController.doPrivileged(Native Method)
	at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
	at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683)
	at java.security.AccessController.doPrivileged(Native Method)
	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
	at sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(StreamRemoteCall.java:283)
	at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:260)
	at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:375)
	at sun.rmi.registry.RegistryImpl_Stub.lookup(RegistryImpl_Stub.java:119)
	at org.ops4j.pax.exam.rbc.client.intern.RemoteBundleContextClientImpl.getRemoteBundleContext(RemoteBundleContextClientImpl.java:248)
	at org.ops4j.pax.exam.rbc.client.intern.RemoteBundleContextClientImpl.waitForState(RemoteBundleContextClientImpl.java:218)
	at org.ops4j.pax.exam.karaf.container.internal.KarafTestContainer.waitForState(KarafTestContainer.java:646)
	at org.ops4j.pax.exam.karaf.container.internal.KarafTestContainer.startKaraf(KarafTestContainer.java:253)
	at org.ops4j.pax.exam.karaf.container.internal.KarafTestContainer.start(KarafTestContainer.java:187)
	at org.ops4j.pax.exam.spi.reactors.AllConfinedStagedReactor.invoke(AllConfinedStagedReactor.java:79)
	at org.ops4j.pax.exam.junit.impl.ProbeRunner$2.evaluate(ProbeRunner.java:267)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
	at org.ops4j.pax.exam.junit.impl.ProbeRunner.run(ProbeRunner.java:98)
	at org.ops4j.pax.exam.junit.PaxExam.run(PaxExam.java:93)
	at org.opendaylight.odlparent.featuretest.PerFeatureRunner.run(PerFeatureRunner.java:72)
	at org.opendaylight.odlparent.featuretest.PerRepoTestRunner.runChild(PerRepoTestRunner.java:153)
	at org.opendaylight.odlparent.featuretest.PerRepoTestRunner.runChild(PerRepoTestRunner.java:28)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
	at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283)
	at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173)
	at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
	at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128)
	at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203)
	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155)
	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103

We suspect that these are "just" transient problems with Karaf sometimes on slower Jenkins agent build VMs taking longer to come up than Pax Exam (e.g. in SFT) was waiting for (at least vorburger I've never hit this locally).

One solution for this may be to further increase some timeout, e.g. in SFT.

On https://lists.opendaylight.org/pipermail/release/2018-February/014115.html rovarga opines that this is something that could possibly be fixed in Pax Exam itself:

Yes, I do believe this is an issue with pax-exam integration. The fact
that the container is still booting should be known within the framework
(it is driven via pax-exam-container-karaf) and hence an attempt to
connect should not be made until the container is brought up.

Slow build VMs are actually very good at uncovering such racey
assumptions (i.e. it will boot in 2 minutes for sure).



 Comments   
Comment by Robert Varga [ 06/Sep/21 ]

These are pretty much non-existent and when they occur they indicate a problem in packaging (either Karaf or features under test). Our jobs now correctly keep the logs from such failures, so they can be diagnosed.

Generated at Wed Feb 07 20:27:45 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.