[NETCONF-1167] Slow NETCONF RPC response caused by unreachable devices Created: 26/Sep/23  Updated: 10/Jan/24  Resolved: 10/Jan/24

Status: Resolved
Project: netconf
Component/s: netconf
Affects Version/s: 5.0.7, 4.0.8, 6.0.4
Fix Version/s: 5.0.10, 6.0.6

Type: Bug Priority: Medium
Reporter: Sangwook Ha Assignee: Peter Suna
Resolution: Done Votes: 0
Labels: pt
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File karaf.log    

 Description   

NETCONF devices configured but unreachable may cause performance issues in other connected NETCONF sessions or even cause disconnection of the sessions.

AsyncSshHandler has a blocking call to verify SSH session connection and this means that the thread is occupied for the timeout period if the device is unreachable. The default values of between-attempts-timeout-millis and connection-timeout-millis are 2000 (2 seconds) and 20000 (20 seconds), respectively, so assuming sleep-factor is 1.0 (the interval between connection attempts stays the same), a single unreachable device can take up about 0.9 CPU core. So a relatively small number of unreachable devices can easily overwhelm the whole thread pool available to all the NETCONF sessions and even the NETCONF sessions without such connectivity issue can experience performance issue - long latency in the RPC response and in some cases it may be hard to maintain connectivity depending on the keepalive settings in the device.



 Comments   
Comment by Peter Suna [ 27/Nov/23 ]

This issue is present until branch 6.0.x. The master branch is not affected.

Steps to reproduce:

  1. Start Karaf:
    feature:install odl-restconf-all odl-netconf-topology
    
  2. Drop ports for test devices:
    sudo iptables -I INPUT -p tcp --dport 17840 -j DROP
    sudo iptables -I INPUT -p tcp --dport 17841 -j DROP
    sudo iptables -I INPUT -p tcp --dport 17842 -j DROP
    sudo iptables -I INPUT -p tcp --dport 17843 -j DROP
    sudo iptables -I INPUT -p tcp --dport 17844 -j DROP
    sudo iptables -I INPUT -p tcp --dport 17845 -j DROP
    sudo iptables -I INPUT -p tcp --dport 17846 -j DROP
    
  1. Start 20 devices
    java -jar netconf-testtool-6.0.6-SNAPSHOT-executable.jar --device-count 20 --ssh true --md-sal true --controller-ip 127.0.0.1 --controller-port 8181
    
  1. Get all devices
    curl 'http://192.168.56.100:8181/rests/data/network-topology:network-topology/topology=topology-netconf' \
    --header 'Authorization: Basic YWRtaW46YWRtaW4='
    

In all devices, there is at least one device that is not immediately connected, and its ports were not set to drop all packets. These devices connect after some time.

Generated at Wed Feb 07 20:16:50 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.