[NETCONF-1097] NETCONF connection status behavior change Created: 20/Jul/23  Updated: 07/Aug/23  Resolved: 07/Aug/23

Status: Resolved
Project: netconf
Component/s: netconf-topology
Affects Version/s: 6.0.0, 7.0.0
Fix Version/s: 7.0.0, 6.0.1

Type: Bug Priority: Medium
Reporter: Sangwook Ha Assignee: Robert Varga
Resolution: Done Votes: 0
Labels: pt, regression
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Issue split
split to NETCONF-1119 Create unit tests for NETCONF connect... Resolved
split to NETCONF-1121 Refactor RemoteDeviceHandler Confirmed
Relates
relates to NETCONF-1070 Rework client reconnection logic Resolved

 Description   

NETCONF-1070 which implements a new NETCONF device connection mechanism introduced changes in how connection-status is updated during reconnection attempts.

Previously, connection-status stayed in the connecting state while reconnection is tried and transitioned to either connected (successful connection) or unable-to-connect (eventual connection failure).

But with netconf-6.0.0, when the connection attempt fails connection-status changes to unable-to-connect and stays in the state until connection is successful and the status changes to connected.

This makes it impossible to distinguish the following two cases with connection-status:

  • Reconnection is still being tried
  • Reconnection attempts were exhausted and device cannot be connected

And since the state transitions from unable-to-connect to connected when the initial connection attempt fails, this may also affect downstream application which expects connection-status changes from connecting to connected - e.g. TransportPCE NetConfTopologyListener.



 Comments   
Comment by Guillaume Lambert [ 21/Jul/23 ]

gthouenon orenais FYI

Comment by Ivan Hrasko [ 25/Jul/23 ]

We have to ensure that:

  1. when we are below max_attempts the device status is connecting
  2. when we are exceeding max_attempts the device status is unable-to-connect
  3. when device is turned off and on again we need to ensure transition from connecting to connected
  4. when device is turned off and on again we need to ensure that device stays in unable-to-connect

All these requirements are satisfied in 5.0.x branch.

Comment by Ivan Hrasko [ 25/Jul/23 ]

The option #3 can be tested when running testool (which automatically creates 17830-sim-device) and simultaneously creating another topology node, for example test-device with the same configuration as 17830-sim-device.

Now, after turning off and on testtool we are getting incorrect response:

GET http://192.168.56.25:8181/rests/data/network-topology:network-topology/topology=topology-netconf?fields=node/node-id;node/netconf-node-topology:connection-status
{
    "network-topology:topology": [
        {
            "node": [
                {
                    "node-id": "17830-sim-device",
                    "netconf-node-topology:connection-status": "connected"
                },
                {
                    "node-id": "test-device",
                    "netconf-node-topology:connection-status": "unable-to-connect"
                }
            ]
        }
    ]
}

We expect both devices to be connected. Assuming the devices max_attempt to reconnect are not exceeded.

Comment by Ivan Hrasko [ 25/Jul/23 ]

...

"netconf-node-topology:max-connection-attempts": 0

...

is used to set unlimited connection attempts.

Comment by Ivan Hrasko [ 04/Aug/23 ]

Ready for review, to p[prevent further occurrences of similar issues we will create unit tests in NETCONF-1119.

Comment by Robert Varga [ 05/Aug/23 ]

This issue shows that we are quite lacking in terms of API definition. The callbacks in RemoteDeviceHandler have no attached meaning, but implementation experience shows there is an implicit lifecycle contract expressed therein.

Generated at Wed Feb 07 20:16:40 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.