[NETCONF-614] Netconf callhome server does not publish disconnect cause Created: 06/May/19 Updated: 07/May/19 Resolved: 07/May/19 |
|
| Status: | Resolved |
| Project: | netconf |
| Component/s: | netconf |
| Affects Version/s: | Neon SR1, Sodium |
| Fix Version/s: | Neon SR1, Sodium |
| Type: | Bug | Priority: | Medium |
| Reporter: | Luis Gomez | Assignee: | Robert Varga |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
As shown here this regression has been introduced very recently: https://jenkins.opendaylight.org/releng/job/netconf-csit-1node-callhome-only-neon/ The operational info just shows DISCONNECTED: /restconf/operational/odl-netconf-callhome-server:netconf-callhome-server
{
"netconf-callhome-server": {
"allowed-devices": {
"device": [
{
"unique-id": "netopeer",
"callhome-status:device-status": "DISCONNECTED",
"ssh-host-key": "AAAAB3NzaC1yc2EAAAADAQABAAABAQDF2UazsHh/amdGT2xpmn5rpSrybUl66cic082NMRS5qEyVyTc8mvgzj4m1gXP/cVOJmZ6TR9AxCUcNnc7MZMpemg4n0kawM4wwwWP43Z46UG342Dz1B1yCgi9WTzsF7/6J1d57xNYIVmX6IQBhg+hbglyM4L658mM22ZkZuWw2AMbMK46HFbRIfwjoqziFi8jjJWU5WtZL5YpCofRb9PkRuH/ejxs/NJXgKA9vEOf8pXQJ8bGNa8WKHBBqidaRdOWa7EMGPIr6bApzN4P/jiY1m3uodaVI64WOg88oc9h1RP6gvJkN8eSBFFLVNuyOgxyNpxXTAyJ84mlTJgDVfmF7"
}
]
}
}
}
Karaf log shows this WARN:
2019-05-06T10:33:39,798 | WARN | nioEventLoopGroup-4-2 | DefaultAuthFuture | 132 - org.apache.sshd.osgi - 2.2.0 | notifyListener(DefaultAuthFuture[id=ssh-connection][value=org.apache.sshd.common.SshException: No more authentication methods available]) failed (NullPointerException) to invoke org.opendaylight.netconf.callhome.protocol.NetconfCallHomeServer$2@600d8dea: null
When it worked I can see this instead: /restconf/operational/odl-netconf-callhome-server:netconf-callhome-server
{
"netconf-callhome-server": {
"allowed-devices": {
"device": [
{
"unique-id": "netopeer",
"callhome-status:device-status": "FAILED_AUTH_FAILURE",
"ssh-host-key": "AAAAB3NzaC1yc2EAAAADAQABAAABAQDF2UazsHh/amdGT2xpmn5rpSrybUl66cic082NMRS5qEyVyTc8mvgzj4m1gXP/cVOJmZ6TR9AxCUcNnc7MZMpemg4n0kawM4wwwWP43Z46UG342Dz1B1yCgi9WTzsF7/6J1d57xNYIVmX6IQBhg+hbglyM4L658mM22ZkZuWw2AMbMK46HFbRIfwjoqziFi8jjJWU5WtZL5YpCofRb9PkRuH/ejxs/NJXgKA9vEOf8pXQJ8bGNa8WKHBBqidaRdOWa7EMGPIr6bApzN4P/jiY1m3uodaVI64WOg88oc9h1RP6gvJkN8eSBFFLVNuyOgxyNpxXTAyJ84mlTJgDVfmF7"
}
]
}
}
}
And in the Karaf log:
2019-04-30T15:43:31,011 | INFO | nioEventLoopGroup-4-2 | CallhomeStatusReporter | 276 - org.opendaylight.netconf.callhome-provider - 1.6.1 | Setting auth failed status for callhome device id:netopeer.
|
| Comments |
| Comment by Robert Varga [ 06/May/19 ] |
|
This looks like the public key is not available after auth failure, hence NetconfCallHomeServer:144 will acquire a null and then it goes sideways. The same think is happening in Sodium. |
| Comment by Robert Varga [ 06/May/19 ] |
|
Furthermore it seems device.getSshHostKey() is getting to be null for some reason. Grrr, gotta love this code |
| Comment by Robert Varga [ 06/May/19 ] |
|
Actually, this seems to be the primary cause: @Override
public void reportFailedAuth(final PublicKey sshKey) {
AuthorizedKeysDecoder decoder = new AuthorizedKeysDecoder();
for (Device device : getDevicesAsList()) {
String keyString = device.getSshHostKey();
try {
PublicKey pubKey = decoder.decodePublicKey(keyString);
if (sshKey.getAlgorithm().equals(pubKey.getAlgorithm()) && sshKey.equals(pubKey)) {
if keyString is null, decodePublicKey() will fail on it. |
| Comment by Robert Varga [ 06/May/19 ] |
|
ecelgp can you check if we are getting warnings introduced by the proposed patch. Would it possible to dump config/oper DS in the errored case? |
| Comment by Luis Gomez [ 06/May/19 ] |
|
Just added oper info when it fails vs when it works. |
| Comment by Robert Varga [ 07/May/19 ] |
|
The DTCL NPEs are present even in successful runs – opened |
| Comment by Robert Varga [ 07/May/19 ] |
|
This is the root cause: kex is not available, as it will be torn down immediately after KeyEstablished event, hence it must be captured before the callback returns. |