[NETCONF-614] Netconf callhome server does not publish disconnect cause Created: 06/May/19  Updated: 07/May/19  Resolved: 07/May/19

Status: Resolved
Project: netconf
Component/s: netconf
Affects Version/s: Neon SR1, Sodium
Fix Version/s: Neon SR1, Sodium

Type: Bug Priority: Medium
Reporter: Luis Gomez Assignee: Robert Varga
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

As shown here this regression has been introduced very recently:

https://jenkins.opendaylight.org/releng/job/netconf-csit-1node-callhome-only-neon/

The operational info just shows DISCONNECTED:

/restconf/operational/odl-netconf-callhome-server:netconf-callhome-server

{
  "netconf-callhome-server": {
    "allowed-devices": {
      "device": [
        {
          "unique-id": "netopeer",
          "callhome-status:device-status": "DISCONNECTED",
          "ssh-host-key": "AAAAB3NzaC1yc2EAAAADAQABAAABAQDF2UazsHh/amdGT2xpmn5rpSrybUl66cic082NMRS5qEyVyTc8mvgzj4m1gXP/cVOJmZ6TR9AxCUcNnc7MZMpemg4n0kawM4wwwWP43Z46UG342Dz1B1yCgi9WTzsF7/6J1d57xNYIVmX6IQBhg+hbglyM4L658mM22ZkZuWw2AMbMK46HFbRIfwjoqziFi8jjJWU5WtZL5YpCofRb9PkRuH/ejxs/NJXgKA9vEOf8pXQJ8bGNa8WKHBBqidaRdOWa7EMGPIr6bApzN4P/jiY1m3uodaVI64WOg88oc9h1RP6gvJkN8eSBFFLVNuyOgxyNpxXTAyJ84mlTJgDVfmF7"
        }
      ]
    }
  }
}

Karaf log shows this WARN:

2019-05-06T10:33:39,798 | WARN  | nioEventLoopGroup-4-2 | DefaultAuthFuture                | 132 - org.apache.sshd.osgi - 2.2.0 | notifyListener(DefaultAuthFuture[id=ssh-connection][value=org.apache.sshd.common.SshException: No more authentication methods available]) failed (NullPointerException) to invoke org.opendaylight.netconf.callhome.protocol.NetconfCallHomeServer$2@600d8dea: null

When it worked I can see this instead:

/restconf/operational/odl-netconf-callhome-server:netconf-callhome-server

{
  "netconf-callhome-server": {
    "allowed-devices": {
      "device": [
        {
          "unique-id": "netopeer",
          "callhome-status:device-status": "FAILED_AUTH_FAILURE",
          "ssh-host-key": "AAAAB3NzaC1yc2EAAAADAQABAAABAQDF2UazsHh/amdGT2xpmn5rpSrybUl66cic082NMRS5qEyVyTc8mvgzj4m1gXP/cVOJmZ6TR9AxCUcNnc7MZMpemg4n0kawM4wwwWP43Z46UG342Dz1B1yCgi9WTzsF7/6J1d57xNYIVmX6IQBhg+hbglyM4L658mM22ZkZuWw2AMbMK46HFbRIfwjoqziFi8jjJWU5WtZL5YpCofRb9PkRuH/ejxs/NJXgKA9vEOf8pXQJ8bGNa8WKHBBqidaRdOWa7EMGPIr6bApzN4P/jiY1m3uodaVI64WOg88oc9h1RP6gvJkN8eSBFFLVNuyOgxyNpxXTAyJ84mlTJgDVfmF7"
        }
      ]
    }
  }
}

And in the Karaf log:

2019-04-30T15:43:31,011 | INFO  | nioEventLoopGroup-4-2 | CallhomeStatusReporter           | 276 - org.opendaylight.netconf.callhome-provider - 1.6.1 | Setting auth failed status for callhome device id:netopeer.


 Comments   
Comment by Robert Varga [ 06/May/19 ]

This looks like the public key is not available after auth failure, hence NetconfCallHomeServer:144 will acquire a null and then it goes sideways. The same think is happening in Sodium.

Comment by Robert Varga [ 06/May/19 ]

Furthermore it seems device.getSshHostKey() is getting to be null for some reason. Grrr, gotta love this code

Comment by Robert Varga [ 06/May/19 ]

Actually, this seems to be the primary cause:

    @Override
    public void reportFailedAuth(final PublicKey sshKey) {
        AuthorizedKeysDecoder decoder = new AuthorizedKeysDecoder();

        for (Device device : getDevicesAsList()) {
            String keyString = device.getSshHostKey();

            try {
                PublicKey pubKey = decoder.decodePublicKey(keyString);
                if (sshKey.getAlgorithm().equals(pubKey.getAlgorithm()) && sshKey.equals(pubKey)) {

if keyString is null, decodePublicKey() will fail on it.

Comment by Robert Varga [ 06/May/19 ]

ecelgp can you check if we are getting warnings introduced by the proposed patch. Would it possible to dump config/oper DS in the errored case?

Comment by Luis Gomez [ 06/May/19 ]

Just added oper info when it fails vs when it works.

Comment by Robert Varga [ 07/May/19 ]

The DTCL NPEs are present even in successful runs – opened NETCONF-615 to track them.

Comment by Robert Varga [ 07/May/19 ]

This is the root cause: kex is not available, as it will be torn down immediately after KeyEstablished event, hence it must be captured before the callback returns.

Generated at Wed Feb 07 20:15:28 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.