Uploaded image for project: 'odlparent'
  1. odlparent
  2. ODLPARENT-113

After hard reset, Robot fails to establish SSH connection to karaf

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • None
    • 2.0.5
    • General
    • None
    • Operating System: All
      Platform: All

    • 9044

      Instead of Odlparent, some other project can be the offender.

      This is a regression from Carbon, ant it and affects CSIT. If a workaround is found, severity would be lower. Currently, this is a blocker for Nitrogen release.

      I was not able to reproduce this manually in local environment (single node), so this might be something specific to CSIT machines, or Robot framework SSH library.

      Hard reset seems to be a necessary condition for this Bug to appear, I have not seen this on 1node CSIT yet. Although it is possible that the difference is just in reset suite not waiting for ODL to finish booting for as long as the initial deploy script does.

      So far I have see two robot symptoms:
      + "SSHException: Signature verification (ssh-dss) failed." [0]
      + "SSHException: No existing session" [1]

      As the reset suite also connects to karaf ssh (to log a message), we see the first connection works:
      2017-08-24 05:36:18,027 | INFO | 6]-nio2-thread-1 | ServerSession | 129 - org.apache.sshd.core - 0.14.0 | Server session created from /10.29.4.11:36962
      2017-08-24 05:36:18,032 | INFO | 6]-nio2-thread-1 | SimpleGeneratorHostKeyProvider | 129 - org.apache.sshd.core - 0.14.0 | Generating host key...
      2017-08-24 05:36:42,338 | INFO | 6]-nio2-thread-2 | ServerSession | 129 - org.apache.sshd.core - 0.14.0 | Server session created from /10.29.4.11:36968
      2017-08-24 05:36:42,496 | INFO | 6]-nio2-thread-2 | LogAuditLoginModule | 109 - org.apache.karaf.jaas.modules - 4.0.9 | Authentication attempt - karaf
      2017-08-24 05:36:42,498 | INFO | 6]-nio2-thread-2 | LogAuditLoginModule | 109 - org.apache.karaf.jaas.modules - 4.0.9 | Authentication succeeded - karaf
      2017-08-24 05:36:42,503 | INFO | 6]-nio2-thread-2 | ServerUserAuthService | 129 - org.apache.sshd.core - 0.14.0 | Session karaf@/10.29.4.11:36968 authenticated
      2017-08-24 05:36:42,771 | INFO | nsole user karaf | core | 112 - org.apache.karaf.log.core - 4.0.9 | ROBOT MESSAGE: Starting suite /w/workspace/controller-csit-3node-clustering-only-nitrogen/test/csit/suites/controller/dom_data_broker/restart_odl_with_tell_based_false.robot

      After reset, it seems that it is the client who decides to refuse the connection [2]:
      2017-08-22 21:09:43,294 | INFO | e]-nio2-thread-1 | ServerSession | 183 - org.apache.sshd.core - 0.14.0 | Server session created from /10.29.4.11:58260
      2017-08-22 21:09:43,301 | WARN | e]-nio2-thread-1 | SimpleGeneratorHostKeyProvider | 183 - org.apache.sshd.core - 0.14.0 | Unable to read key /tmp/karaf-0.7.0/etc/host.key: java.lang.ClassNotFoundException: org.bouncycastle.jcajce.provider.asymmetric.rsa.BCRSAPrivateCrtKey cannot be found by org.apache.sshd.core_0.14.0
      2017-08-22 21:09:43,304 | INFO | e]-nio2-thread-1 | SimpleGeneratorHostKeyProvider | 183 - org.apache.sshd.core - 0.14.0 | Generating host key...
      2017-08-22 21:09:51,923 | ERROR | e]-nio2-thread-1 | SimpleGeneratorHostKeyProvider | 183 - org.apache.sshd.core - 0.14.0 | Overwriting key (host.key) is disabled: using throwaway java.security.KeyPair@17c31afb
      2017-08-22 21:10:11,347 | INFO | e]-nio2-thread-3 | ServerSession | 183 - org.apache.sshd.core - 0.14.0 | Server session created from /10.29.4.11:58266
      2017-08-22 21:10:12,668 | INFO | e]-nio2-thread-7 | ServerSession | 183 - org.apache.sshd.core - 0.14.0 | Server session created from /10.29.4.11:58274
      2017-08-22 21:10:13,765 | INFO | e]-nio2-thread-2 | ServerSession | 183 - org.apache.sshd.core - 0.14.0 | Server session created from /10.29.4.11:58276
      2017-08-22 21:10:13,823 | WARN | e]-nio2-thread-5 | ServerSession | 183 - org.apache.sshd.core - 0.14.0 | Exception caught
      java.io.IOException: Connection reset by peer
      at sun.nio.ch.FileDispatcherImpl.read0(Native Method)[:1.8.0_141]
      at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)[:1.8.0_141]
      at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)[:1.8.0_141]
      at sun.nio.ch.IOUtil.read(IOUtil.java:197)[:1.8.0_141]
      at sun.nio.ch.UnixAsynchronousSocketChannelImpl.finishRead(UnixAsynchronousSocketChannelImpl.java:387)[:1.8.0_141]
      at sun.nio.ch.UnixAsynchronousSocketChannelImpl.finish(UnixAsynchronousSocketChannelImpl.java:191)[:1.8.0_141]
      at sun.nio.ch.UnixAsynchronousSocketChannelImpl.onEvent(UnixAsynchronousSocketChannelImpl.java:213)[:1.8.0_141]
      at sun.nio.ch.EPollPort$EventHandlerTask.run(EPollPort.java:293)[:1.8.0_141]
      at java.lang.Thread.run(Thread.java:748)[:1.8.0_141]

      The reset suite kills all ODL members, deletes several directories (including data, but preserving karaf.log), starts members and waits for jolokia to confirm all shards have their leaders elected.

      [0] https://logs.opendaylight.org/releng/jenkins092/bgpcep-csit-3node-periodic-bgpclustering-only-nitrogen/123/log.html.gz#s1-s2-k1-k1-k2-k3-k1-k1-k1-k1-k10
      [1] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-nitrogen/123/log.html.gz#s1-s2-k1-k1-k2-k3-k1-k1-k1-k1-k10
      [2] https://logs.opendaylight.org/releng/jenkins092/bgpcep-csit-3node-periodic-bgpclustering-only-nitrogen/123/odl1_karaf.log.gz

            skitt Stephen Kitt
            vrpolak Vratko Polak
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: