[ODLPARENT-139] Karaf shell over SSH disconnects on feature:install netvirt-openstack Created: 30/Jan/18  Updated: 22/Aug/19  Resolved: 22/Aug/19

Status: Resolved
Project: odlparent
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: High
Reporter: Daniel Farrell Assignee: Stephen Kitt
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

My new packaging tests are catching a problem with all recent Oxygen snapshots.

If I do a feature:install odl-netvirt-openstack on a Karaf shell over SSH, the connection is dropped. If I do the same install on a local karaf shell
(./karaf vs ./start and ssh -p 8181 karaf at localhost) it works. SSH works for recent Nitrogen snapshots.

I reproduced it locally with a tarball, not an RPM, to make sure my packaging logic wasn't the problem.

https://nexus.opendaylight.org/content/repositories/opendaylight.snapshot/org/opendaylight/integration/karaf/0.8.0-SNAPSHOT/karaf-0.8.0-20180129.175223-1356.tar.gz

You can see it failing in all the recent packaging-build-rpm-snap jobs (since I added test logic that finds it).

https://jenkins.opendaylight.org/releng/job/packaging-build-rpm-snap-oxygen/

Here's the karaf.log (no errors/warns):

https://pastebin.com/uewzEhVd



 Comments   
Comment by Jamo Luhrsen [ 30/Jan/18 ]

let me know if I can do anything to test/debug that would help

Comment by Michael Vorburger [ 30/Jan/18 ]

skitt said during the Kernel call that this must be another case of those pesky feature reload problems - the SSH server in Karaf restarts as an (unintended) consequence of the feature:install odl-netvirt-openstack ... skitt usually can figure out what's causing those bundle reloads and conjure up a fix. In the mean time though dfarrell07 and jluhrsen, I bet you guys that using feature:install --no-auto-refresh odl-netvirt-openstack probably works around this problem?

Comment by Daniel Farrell [ 31/Jan/18 ]

> I bet you guys that using feature:install --no-auto-refresh odl-netvirt-openstack probably works around this problem?

vorburger I tried it locally and in the sandbox, and sadly no it didn't work around it.

https://jenkins.opendaylight.org/sandbox/job/packaging-test-rpm-master/22/console

Here's another karaf.log, but this time I waited longer before collecting it (since ODL was still running after it dropped my SSH connection)

https://pastebin.com/iB8xZADJ

It now shows some warnings related to blueprint, maybe relevant?

Comment by Daniel Farrell [ 05/Feb/18 ]

I guess this should be a blocker for Oxygen, right? It's a major, user-facing regression from Nitrogen. klou

Comment by Daniel Farrell [ 05/Feb/18 ]

jluhrsen et al - can you just sanity check that you also see this locally? It should be easy to reproduce.

Comment by Jamo Luhrsen [ 05/Feb/18 ]

yeah, I can reproduce this:

  • sudo systemctl start opendaylight
  • ssh -p 8101 karaf@127.0.0.1
  • feature:install odl-openstack-netvirt

(after aprox 60s, the ssh connection is closed)

 

I can ssh back in though, so maybe this is not a complete blocker. But it does break our automation
and tests so it would be nice to get fixed.

dfarrell07, maybe we can change our tests to re-connect and continue it's work, instead of failing
when the ssh connection is dropped.

I also confirmed that his problem is not in a recent Nitrogen distro.

Comment by Kit Lou [ 05/Feb/18 ]

I have marked this issue as a blocker for Oxygen.

Comment by Stephen Kitt [ 06/Feb/18 ]

dfarrell07 yes I can reproduce it too — that’s why I marked it “confirmed” a few days ago .

Comment by Stephen Kitt [ 06/Feb/18 ]

Related: https://git.opendaylight.org/gerrit/67966

Comment by Kit Lou [ 27/Feb/18 ]

Please provide an update on this issue.  Is this a true blocker?  Any ETA on resolution?

Comment by Daniel Farrell [ 07/Mar/18 ]

Talked with skitt about this and we seem to agree that it should be downgraded from blocker. It's a major bug that may impact downstream automation, but it's not going to be easy/realistic to fix quickly, and since there's a work-around (reconnect to SSH) maybe it shouldn't block the release.

Comment by Robert Varga [ 22/Aug/19 ]

I cannot reproduce this with odlparent-5.0.0

Generated at Wed Feb 07 20:27:44 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.