-
Bug
-
Resolution: Cannot Reproduce
-
None
-
2.0.5
-
None
-
Operating System: All
Platform: All
-
9044
Instead of Odlparent, some other project can be the offender.
This is a regression from Carbon, ant it and affects CSIT. If a workaround is found, severity would be lower. Currently, this is a blocker for Nitrogen release.
I was not able to reproduce this manually in local environment (single node), so this might be something specific to CSIT machines, or Robot framework SSH library.
Hard reset seems to be a necessary condition for this Bug to appear, I have not seen this on 1node CSIT yet. Although it is possible that the difference is just in reset suite not waiting for ODL to finish booting for as long as the initial deploy script does.
So far I have see two robot symptoms:
+ "SSHException: Signature verification (ssh-dss) failed." [0]
+ "SSHException: No existing session" [1]
As the reset suite also connects to karaf ssh (to log a message), we see the first connection works:
2017-08-24 05:36:18,027 | INFO | 6]-nio2-thread-1 | ServerSession | 129 - org.apache.sshd.core - 0.14.0 | Server session created from /10.29.4.11:36962
2017-08-24 05:36:18,032 | INFO | 6]-nio2-thread-1 | SimpleGeneratorHostKeyProvider | 129 - org.apache.sshd.core - 0.14.0 | Generating host key...
2017-08-24 05:36:42,338 | INFO | 6]-nio2-thread-2 | ServerSession | 129 - org.apache.sshd.core - 0.14.0 | Server session created from /10.29.4.11:36968
2017-08-24 05:36:42,496 | INFO | 6]-nio2-thread-2 | LogAuditLoginModule | 109 - org.apache.karaf.jaas.modules - 4.0.9 | Authentication attempt - karaf
2017-08-24 05:36:42,498 | INFO | 6]-nio2-thread-2 | LogAuditLoginModule | 109 - org.apache.karaf.jaas.modules - 4.0.9 | Authentication succeeded - karaf
2017-08-24 05:36:42,503 | INFO | 6]-nio2-thread-2 | ServerUserAuthService | 129 - org.apache.sshd.core - 0.14.0 | Session karaf@/10.29.4.11:36968 authenticated
2017-08-24 05:36:42,771 | INFO | nsole user karaf | core | 112 - org.apache.karaf.log.core - 4.0.9 | ROBOT MESSAGE: Starting suite /w/workspace/controller-csit-3node-clustering-only-nitrogen/test/csit/suites/controller/dom_data_broker/restart_odl_with_tell_based_false.robot
After reset, it seems that it is the client who decides to refuse the connection [2]:
2017-08-22 21:09:43,294 | INFO | e]-nio2-thread-1 | ServerSession | 183 - org.apache.sshd.core - 0.14.0 | Server session created from /10.29.4.11:58260
2017-08-22 21:09:43,301 | WARN | e]-nio2-thread-1 | SimpleGeneratorHostKeyProvider | 183 - org.apache.sshd.core - 0.14.0 | Unable to read key /tmp/karaf-0.7.0/etc/host.key: java.lang.ClassNotFoundException: org.bouncycastle.jcajce.provider.asymmetric.rsa.BCRSAPrivateCrtKey cannot be found by org.apache.sshd.core_0.14.0
2017-08-22 21:09:43,304 | INFO | e]-nio2-thread-1 | SimpleGeneratorHostKeyProvider | 183 - org.apache.sshd.core - 0.14.0 | Generating host key...
2017-08-22 21:09:51,923 | ERROR | e]-nio2-thread-1 | SimpleGeneratorHostKeyProvider | 183 - org.apache.sshd.core - 0.14.0 | Overwriting key (host.key) is disabled: using throwaway java.security.KeyPair@17c31afb
2017-08-22 21:10:11,347 | INFO | e]-nio2-thread-3 | ServerSession | 183 - org.apache.sshd.core - 0.14.0 | Server session created from /10.29.4.11:58266
2017-08-22 21:10:12,668 | INFO | e]-nio2-thread-7 | ServerSession | 183 - org.apache.sshd.core - 0.14.0 | Server session created from /10.29.4.11:58274
2017-08-22 21:10:13,765 | INFO | e]-nio2-thread-2 | ServerSession | 183 - org.apache.sshd.core - 0.14.0 | Server session created from /10.29.4.11:58276
2017-08-22 21:10:13,823 | WARN | e]-nio2-thread-5 | ServerSession | 183 - org.apache.sshd.core - 0.14.0 | Exception caught
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)[:1.8.0_141]
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)[:1.8.0_141]
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)[:1.8.0_141]
at sun.nio.ch.IOUtil.read(IOUtil.java:197)[:1.8.0_141]
at sun.nio.ch.UnixAsynchronousSocketChannelImpl.finishRead(UnixAsynchronousSocketChannelImpl.java:387)[:1.8.0_141]
at sun.nio.ch.UnixAsynchronousSocketChannelImpl.finish(UnixAsynchronousSocketChannelImpl.java:191)[:1.8.0_141]
at sun.nio.ch.UnixAsynchronousSocketChannelImpl.onEvent(UnixAsynchronousSocketChannelImpl.java:213)[:1.8.0_141]
at sun.nio.ch.EPollPort$EventHandlerTask.run(EPollPort.java:293)[:1.8.0_141]
at java.lang.Thread.run(Thread.java:748)[:1.8.0_141]
The reset suite kills all ODL members, deletes several directories (including data, but preserving karaf.log), starts members and waits for jolokia to confirm all shards have their leaders elected.
[0] https://logs.opendaylight.org/releng/jenkins092/bgpcep-csit-3node-periodic-bgpclustering-only-nitrogen/123/log.html.gz#s1-s2-k1-k1-k2-k3-k1-k1-k1-k1-k10
[1] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-nitrogen/123/log.html.gz#s1-s2-k1-k1-k2-k3-k1-k1-k1-k1-k10
[2] https://logs.opendaylight.org/releng/jenkins092/bgpcep-csit-3node-periodic-bgpclustering-only-nitrogen/123/odl1_karaf.log.gz
- is blocked by
-
ODLPARENT-86 Milestore: upgrade karaf to 4.1.2 or later
- Verified
- is duplicated by
-
NETVIRT-874 CSIT 3node failures - karaf connection rejected
- Resolved