[LISPMAP-72] loading compaitble-with-all feature set hangs karaf console Created: 13/Oct/15  Updated: 19/Oct/17  Resolved: 05/Dec/15

Status: Resolved
Project: lispflowmapping
Component/s: Service
Affects Version/s: Beryllium
Fix Version/s: None

Type: Bug
Reporter: Jamo Luhrsen Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Text File 4462_karaf.log    
External issue ID: 4462

 Description   

The CI jobs are failing for those tests that run with the "compatible-with-all" feature
set. Locally, I can reproduce and the karaf console is hung when this feature set it
loaded. the system's CPUs are all running at 100% and I have to forcefully kill the
karaf process to recover.

attached is the full karaf.log file, but here is the final Exception seen in the karaf.log
file when the problem is reproduced:

2015-10-13 17:11:07,730 | ERROR | rint Extender: 3 | BlueprintContainerImpl | 15 - org.apache.aries.blueprint.core - 1.4.2 | Unable to start blueprint container for bundle org.opendaylight.lispflowmapping.mappingservice.shell due to unresolved dependencies [(objectClass=org.opendaylight.lispflowmapping.interfaces.mappingservice.IMappingServiceShell)]
java.util.concurrent.TimeoutException
at org.apache.aries.blueprint.container.BlueprintContainerImpl$1.run(BlueprintContainerImpl.java:336)[15:org.apache.aries.blueprint.core:1.4.2]
at org.apache.aries.blueprint.utils.threading.impl.DiscardableRunnable.run(DiscardableRunnable.java:48)[15:org.apache.aries.blueprint.core:1.4.2]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)[:1.7.0_85]
at java.util.concurrent.FutureTask.run(FutureTask.java:262)[:1.7.0_85]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)[:1.7.0_85]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)[:1.7.0_85]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)[:1.7.0_85]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)[:1.7.0_85]
at java.lang.Thread.run(Thread.java:745)[:1.7.0_85]



 Comments   
Comment by Jamo Luhrsen [ 13/Oct/15 ]

Attachment 4462_karaf.log has been added with description: karaf log file

Comment by Lori Jakab [ 13/Oct/15 ]

JamO, how do I reproduce this locally?

Comment by Lori Jakab [ 15/Oct/15 ]

Disabled blueprint timeout: https://git.opendaylight.org/gerrit/#/c/28372/

Comment by Luis Gomez [ 15/Oct/15 ]

I do not know if this is needed anymore but thanks for the patch.

Comment by Jamo Luhrsen [ 15/Oct/15 ]

(In reply to Lori Jakab from comment #2)
> JamO, how do I reproduce this locally?

Sorry, Lori, I missed that you asked for this.

the steps, for historical purposes:

1. karaf clean
2. feature:repo-add mvn:org.opendaylight.integration/features-integration-test/0.4.0-SNAPSHOT/xml/features
3. feature:install odl-integration-compatible-with-all

Comment by Jamo Luhrsen [ 15/Oct/15 ]

I'm not sure if I am misinformed, but I am still having trouble in this area.

I took a recent (10/15) distribution from nexus:

distribution-karaf-0.4.0-20151015.221211-1660.zip

then:

karaf clean

feature:repo-add mvn:org.opendaylight.integration/features-integration-test/0.4.0-SNAPSHOT/xml/features

opendaylight-user@root>feature:install odl-integration-compatible-with-all
Refreshing bundles org.apache.aries.util (9)
GossipRouter started at Thu Oct 15 23:21:20 UTC 2015
Listening on port 12001 bound on address 0.0.0.0/0.0.0.0
Backlog is 1000, linger timeout is 2000, and read timeout is 0

####karaf console is hung here####

after some time, the karaf.log file shows this:

2015-10-15 23:26:29,521 | ERROR | rint Extender: 3 | BlueprintContainerImpl | 15 - org.apache.aries.blueprint.core - 1.4.2 | Unable to start blueprint container for bundle org.opendaylight.lispflowmapping.mappingservice.shell due to unresolved dependencies [(objectClass=org.opendaylight.lispflowmapping.interfaces.mappingservice.IMappingServiceShell)]
java.util.concurrent.TimeoutException
at org.apache.aries.blueprint.container.BlueprintContainerImpl$1.run(BlueprintContainerImpl.java:336)[15:org.apache.aries.blueprint.core:1.4.2]
at org.apache.aries.blueprint.utils.threading.impl.DiscardableRunnable.run(DiscardableRunnable.java:48)[15:org.apache.aries.blueprint.core:1.4.2]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)[:1.7.0_85]
at java.util.concurrent.FutureTask.run(FutureTask.java:262)[:1.7.0_85]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)[:1.7.0_85]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)[:1.7.0_85]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)[:1.7.0_85]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)[:1.7.0_85]
at java.lang.Thread.run(Thread.java:745)[:1.7.0_85]

Note, if I do the above manual steps automatically by putting the "features-integration-test" repo
and feature sets in the org.apache.karaf.features.cfg file I don't end up with a hung karaf
console, but I still don't get a functional system (e.g. auth fails with basic rest calls, openflow
ports don't come up) and I see the same lispflowmapping TimeoutException in the log.

Comment by Lori Jakab [ 16/Oct/15 ]

When you say karaf console is hung, you mean it doesn't return a prompt, and won't accept input? The messages you show (like "Refreshing bundles org.apache.aries.util (9)" are not using the logging system and are just printed to System.out.

I tried reproducing the issue using two different distributions, distribution-karaf-0.4.0-20151015.221211-1660.tar.gz (the one you mentioned) and distribution-karaf-0.4.0-20151016.063152-1667.tar.gz. In neither case could I reproduce the issue, I got a functioning system, with RESTCONF working.

In any case, the TimeoutException is just a symptom of a broken system, so in fact I had my doubts of the usefulness of disabling the timeout in the patch referenced in Comment 3, because the exception actually uncovers problems that otherwise might go undetected.

In any case, I don't understand why you still get the exception, considering the timeout was disabled (I checked in the distribution files the blueprint.xml file to see if my change from that patch was indeed present).

Not sure what else we could do to consider this bug fixed. From my POV, the fact that we get the exception only shows that LISP wasn't able to initialize correctly, but the cause of that lies somewhere else.

Comment by Vratko Polak [ 27/Oct/15 ]

> feature:install odl-integration-compatible-with-all

I think integration/test has everything needed for someone to whip up a suite and a job to test that.
Even if this Bug is fixed, we still want to test for this "console hang" symptom.

Created a Trello card [0] for that.

[0] https://trello.com/c/LJ4zwgYB/234-bug-4462-describes-a-failure-of-karaf-console-after-feature-install-create-a-suite-and-a-job-that-would-test-for-that-symptom

Comment by Jamo Luhrsen [ 27/Oct/15 ]

thanks for the trello card Vratko. I'm still seeing issues when loading
compatible-with-all that I cannot nail down. I filed a new bug today:

https://bugs.opendaylight.org/show_bug.cgi?id=4535

I'm really perplexed as to why I can see these issues across multiple
platforms but it's not seen in CSIT jobs?

Comment by Vratko Polak [ 16/Nov/15 ]

I this Bug still present?

Comment by Jamo Luhrsen [ 05/Dec/15 ]

This has been fixed. I do not have the patch(es) handy to paste here,
but I have confirmed this to not exist in latest master distro or Lithium SR3

Generated at Wed Feb 07 20:06:26 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.