[NETCONF-850] Cannot reconnect on second time Created: 24/Jan/22  Updated: 11/Feb/22  Resolved: 11/Feb/22

Status: Resolved
Project: netconf
Component/s: netconf
Affects Version/s: 2.0.11
Fix Version/s: 2.0.13, 1.13.8

Type: Bug Priority: High
Reporter: Martin Sunal Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File karaf.log     Text File mount-all.txt    
Issue Links:
Duplicate
duplicates NETCONF-852 NETCONF mount fails during reconnect Resolved

 Description   

ODL NETCONF client cannot reconnect after first reconnection.

Distribution: onap-karaf-0.15.1
Features installed: odl-netconf-topology, odl-restconf-nb-rfc8040, odl-mdsal-apidocs, jolokia

Steps to reproduce:

 1. start netconf-testtool with 10 devices:

java -Djava.security.egd=file:/dev/./urandom -jar netconf-testtool-2.0.11-executable.jar --ssh true --device-count 10

 2. mount all devices (mount-all.txt) => all devices are in "connected" state and it is possible to fetch data like:

GET http://localhost:8181/rests/data/network-topology:network-topology/topology=topology-netconf/node=1/yang-ext:mount/ietf-netconf-monitoring:netconf-state?content=nonconfig

 3. kill netconf-testtool => all devices are in "connecting" state and fetch data from step2 returns:

{{ "errors": { "error": [ { "error-tag": "resource-denied-transport", "error-message": "Mount point does not exist.", "error-type": "protocol" } ] }}

All that is expected.

 4. start testtool as in step 1 and acknowledge that it works as in step 2

 5. kill netconf-testtool => all devices will be still in "connected" state = PROBLEM and fetch data from step2 returns: 500 Internal server error

{{ "errors": { "error": [ { "error-tag": "operation-failed", "error-info": "java.lang.IllegalArgumentException: Unable to read data: Optional[/(urn:ietf:params:xml:ns:yang:ietf-netconf-monitoring?revision=2010-10-04)netconf-state], errors: [RpcError [message=Channel closed, severity=ERROR, errorType=TRANSPORT, tag=operation-failed, applicationTag=null, info=null, cause=null]]", "error-message": "Transaction failed", "error-type": "application" } ] }}

and keepalives are running and in logs show:

10:53:47.212 WARN [globalWorkerGroup-3-9] RemoteDevice{1}: Keepalive RPC failed with error: [RpcError [message=Channel closed, severity=ERROR, errorType=TRANSPORT, tag=operation-failed, applicationTag=null, info=null, cause=null]]

6. start testtool as in step 1 => ODL will not reconnect devices and ODL is stuck in state from step 5.
(only unmount and mount will reconnect devices)

 

The problem is in step 5. and 6. when ODL reports "connected" but there is no connection to NETCONF testtool and ODL does not reconnect automatically when testtool is started again.



 Comments   
Comment by Peter Puškár [ 11/Feb/22 ]

Confirmed this issue is present on Phosphorus SR-1, although after retest on master it is not present anymore.

Seems like it was fixed by this patch:

99462: Do not reuse inactive handler

Generated at Wed Feb 07 20:16:04 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.