When the network connectivity between controller & NETCONF device is lost and packets are dropped because of network disruption along the path, it takes a long time until connection loss is detected and connection status is updated.
The following logs show that it took more than 15 minutes after sending a NETCONF keepalive probe before the connection timed out ('keepalive-delay' was 30 seconds):
22:31:29.954 DEBUG [globalWorkerGroup-3-7] Consume RemoteWindow[client](NettyAwareChannelSubsystem[id=0, recipient=43]-NetconfClientSessionImpl[netconf@/10.100.18.43:830][netconf]) by 325 down to 1277183 22:31:29.954 DEBUG [globalWorkerGroup-3-7] encode(NetconfClientSessionImpl[netconf@/10.100.18.43:830]) packet #17 sending command=94[SSH_MSG_CHANNEL_DATA] len=334 22:31:29.955 DEBUG [globalWorkerGroup-3-7] writeBuffer(Nio2Session[local=/172.17.0.2:33432, remote=/10.100.18.43:830]) writing 388 bytes 22:47:01.413 DEBUG [sshd-NetconfSshClient[42908dcc]-nio2-thread-6] handleReadCycleFailure(Nio2Session[local=/172.17.0.2:33432, remote=/10.100.18.43:830]) IOException after 961462127645 nanos at read cycle=20: Connection timed out 22:47:01.416 DEBUG [sshd-NetconfSshClient[42908dcc]-nio2-thread-6] exceptionCaught(Nio2Session[local=/172.17.0.2:33432, remote=/10.100.18.43:830]) caught IOException[Connection timed out] - calling handler 22:47:01.417 DEBUG [sshd-NetconfSshClient[42908dcc]-nio2-thread-6] signalAuthFailure(NetconfClientSessionImpl[netconf@/10.100.18.43:830]) type=IOException, signalled=false, first=false: Connection timed out 22:47:01.417 WARN [sshd-NetconfSshClient[42908dcc]-nio2-thread-6] exceptionCaught(NetconfClientSessionImpl[netconf@/10.100.18.43:830])[state=Opened] IOException: Connection timed out java.io.IOException: Connection timed out at sun.nio.ch.SocketDispatcher.read0(Native Method) ~[?:?] at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:47) ~[?:?] at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:330) ~[?:?] at sun.nio.ch.IOUtil.read(IOUtil.java:296) ~[?:?] at sun.nio.ch.IOUtil.read(IOUtil.java:266) ~[?:?] at sun.nio.ch.UnixAsynchronousSocketChannelImpl.finishRead(UnixAsynchronousSocketChannelImpl.java:400) ~[?:?] at sun.nio.ch.UnixAsynchronousSocketChannelImpl.finish(UnixAsynchronousSocketChannelImpl.java:195) ~[?:?] at sun.nio.ch.UnixAsynchronousSocketChannelImpl.onEvent(UnixAsynchronousSocketChannelImpl.java:217) ~[?:?] at sun.nio.ch.EPollPort$EventHandlerTask.run(EPollPort.java:306) ~[?:?] at java.lang.Thread.run(Thread.java:833) ~[?:?] 22:47:01.419 DEBUG [sshd-NetconfSshClient[42908dcc]-nio2-thread-6] sessionException(NetconfClientSessionImpl[netconf@/10.100.18.43:830]) IOException: Connection timed out 22:47:01.420 DEBUG [sshd-NetconfSshClient[42908dcc]-nio2-thread-6] close(NetconfClientSessionImpl[netconf@/10.100.18.43:830]) Closing immediately 22:47:01.420 DEBUG [sshd-NetconfSshClient[42908dcc]-nio2-thread-6] signalAuthFailure(NetconfClientSessionImpl[netconf@/10.100.18.43:830]) type=SshException, signalled=false, first=false: Session is being closed 22:47:01.421 DEBUG [sshd-NetconfSshClient[42908dcc]-nio2-thread-6] close(ClientConnectionService[NetconfClientSessionImpl[netconf@/10.100.18.43:830]]) Closing immediately
When keepalive is enabled, it is reasonable to expect that the connection status is updated in the order of 'keepalive-delay'.
- is duplicated by
-
NETCONF-1175 TCP session is hanging when NETCONF session is dead
- Resolved
-
NETCONF-1195 Device Remain Connected after it get stopped
- Resolved
- relates to
-
NETCONF-1195 Device Remain Connected after it get stopped
- Resolved
-
NETCONF-1201 Long delay in connection status update after connection issue for multiple devices
- Resolved