[NETCONF-827] Mount loop when setting too low connection-timeout Created: 02/Nov/21  Updated: 31/Oct/22  Resolved: 13/Oct/22

Status: Resolved
Project: netconf
Component/s: netconf
Affects Version/s: 3.0.0, 4.0.0, 2.0.14, 2.0.16, 3.0.6, 4.0.2
Fix Version/s: 3.0.7, 4.0.3, 2.0.17

Type: Bug Priority: Medium
Reporter: Jaroslav Tóth Assignee: Ivan Martiniak
Resolution: Done Votes: 0
Labels: pt
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File karaf.log    
Issue Links:
Duplicate
is duplicated by NETCONF-880 Netconf does not close device mountpo... Resolved
Issue split
split to NETCONF-905 NetconfSessionPromise reconnection fa... Resolved

 Description   

 

After setting of small value for connection-timeout:

{
    "node": [
        {
            "node-id": "testtool",
            "netconf-node-topology:host": "127.0.0.1",
            "netconf-node-topology:port": 36000,
            "netconf-node-topology:keepalive-delay": 10,
            "netconf-node-topology:tcp-only": false,
            "netconf-node-topology:username": "admin",
            "netconf-node-topology:password": "admin",
            "netconf-node-topology:connection-timeout-millis": 100
        }
    ]
}

sometimes the race condition between processing of received NETCONF hello message from server and cancellation of connection attempt occurs; symptoms:

021-11-02T20:28:47,433 | WARN  | nioEventLoopGroupCloseable-3-12 | AbstractNetconfSessionNegotiator | 274 - org.opendaylight.netconf.netty-util - 1.9.1 | An exception occurred during negotiation with nulljava.lang.IllegalStateException: complete already: NetconfSessionPromise@4b94e299(failure: java.lang.IllegalStateException: Cannot change state from FAILED to ESTABLISHED for chanel [id: 0xa5b25536])
    at io.netty.util.concurrent.DefaultPromise.setSuccess(DefaultPromise.java:99) ~[bundleFile:4.1.52.Final]
    at org.opendaylight.netconf.nettyutil.NetconfSessionPromise.setSuccess(NetconfSessionPromise.java:79) ~[?:?]
    at org.opendaylight.netconf.nettyutil.NetconfSessionPromise.setSuccess(NetconfSessionPromise.java:27) ~[?:?]
    at org.opendaylight.netconf.nettyutil.AbstractNetconfSessionNegotiator.negotiationSuccessful(AbstractNetconfSessionNegotiator.java:284) ~[?:?]
    at org.opendaylight.netconf.client.NetconfClientSessionNegotiator.access$100(NetconfClientSessionNegotiator.java:43) ~[?:?]
    at org.opendaylight.netconf.client.NetconfClientSessionNegotiator$ExiConfirmationInboundHandler.channelRead(NetconfClientSessionNegotiator.java:216) ~[?:?]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [bundleFile:4.1.52.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [bundleFile:4.1.52.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [bundleFile:4.1.52.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324) [bundleFile:4.1.52.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296) [bundleFile:4.1.52.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [bundleFile:4.1.52.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [bundleFile:4.1.52.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [bundleFile:4.1.52.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324) [bundleFile:4.1.52.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296) [bundleFile:4.1.52.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [bundleFile:4.1.52.Final]
    at io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:61) [bundleFile:4.1.52.Final]
    at io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:370) [bundleFile:4.1.52.Final]
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [bundleFile:4.1.52.Final]
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) [bundleFile:4.1.52.Final]
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) [bundleFile:4.1.52.Final]
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [bundleFile:4.1.52.Final]
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [bundleFile:4.1.52.Final]
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [bundleFile:4.1.52.Final]
    at java.lang.Thread.run(Thread.java:829) [?:?]
2021-11-02T20:28:47,437 | WARN  | nioEventLoopGroupCloseable-3-12 | AbstractChannelHandlerContext    | 53 - io.netty.common - 4.1.52.Final | An exception 'java.lang.IllegalStateException: complete already: NetconfSessionPromise@4b94e299(failure: java.lang.IllegalStateException: Cannot change state from FAILED to ESTABLISHED for chanel [id: 0xa5b25536])' [enable DEBUG level for full stacktrace] was thrown by a user handler's exceptionCaught() method while handling the following exception:java.lang.IllegalStateException: complete already: NetconfSessionPromise@4b94e299(failure: java.lang.IllegalStateException: Cannot change state from FAILED to ESTABLISHED for chanel [id: 0xa5b25536])
    at io.netty.util.concurrent.DefaultPromise.setSuccess(DefaultPromise.java:99) ~[bundleFile:4.1.52.Final]
    at org.opendaylight.netconf.nettyutil.NetconfSessionPromise.setSuccess(NetconfSessionPromise.java:79) ~[?:?]
    at org.opendaylight.netconf.nettyutil.NetconfSessionPromise.setSuccess(NetconfSessionPromise.java:27) ~[?:?]
    at org.opendaylight.netconf.nettyutil.AbstractNetconfSessionNegotiator.negotiationSuccessful(AbstractNetconfSessionNegotiator.java:284) ~[?:?]
    at org.opendaylight.netconf.client.NetconfClientSessionNegotiator.access$100(NetconfClientSessionNegotiator.java:43) ~[?:?]
    at org.opendaylight.netconf.client.NetconfClientSessionNegotiator$ExiConfirmationInboundHandler.channelRead(NetconfClientSessionNegotiator.java:216) ~[?:?]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [bundleFile:4.1.52.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [bundleFile:4.1.52.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [bundleFile:4.1.52.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324) [bundleFile:4.1.52.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296) [bundleFile:4.1.52.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [bundleFile:4.1.52.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [bundleFile:4.1.52.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [bundleFile:4.1.52.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324) [bundleFile:4.1.52.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296) [bundleFile:4.1.52.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [bundleFile:4.1.52.Final]
    at io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:61) [bundleFile:4.1.52.Final]
    at io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:370) [bundleFile:4.1.52.Final]
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [bundleFile:4.1.52.Final]
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) [bundleFile:4.1.52.Final]
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) [bundleFile:4.1.52.Final]
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [bundleFile:4.1.52.Final]
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [bundleFile:4.1.52.Final]
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [bundleFile:4.1.52.Final]    at java.lang.Thread.run(Thread.java:829) [?:?]

It is fine that it fails, however the issue comes when the next re-connection attempt is launched. Even if it is successful, it will again fail with the same/similar error that some session promise has already been set.

See full logs with replicated bug (manually crafted buggy scenario by stopping execution using breakpoint in the NetconfClientSessionNegotiator#handleMessage): karaf.log

 



 Comments   
Comment by Jaroslav Tóth [ 02/Nov/21 ]

proposed workaround:

Comment by Robert Varga [ 14/Nov/21 ]

The logs are from Aluminium SR1. The code has shifted significantly since then and hence this needs to be first reproduced.

Comment by Robert Varga [ 31/May/22 ]

This issue seems to be replicated in NETCONF-880.

Comment by Robert Varga [ 31/May/22 ]

Based on the refreshed logs, the failure recorded is a AbstractNetconfSessionNegotiator guard and we are attempting to transition from FAILED to ESTABLISHED state. This would indicate we are reusing a negotiatior across multiple attempts or we have a fail-than-succeed on the stack.

Comment by Sangwook Ha [ 28/Sep/22 ]

This issue seems to be caused by the race between channelActive/channelRead & channel closure triggered by connection timeout - e.g. channelRead tries to change the state to ESTABLISHED right after timeoutExpired changed the state to FAILED and called channel.close().
Once this happens, promise is set to failed and following session negotiations all fail.

Comment by Ivan Martiniak [ 12/Oct/22 ]
java.lang.IllegalStateException: Cannot change state from FAILED to ESTABLISHED for channel [id: 0x663f1d10]
at com.google.common.base.Preconditions.checkState(Preconditions.java:840) ~[bundleFile:?]
at org.opendaylight.netconf.nettyutil.AbstractNetconfSessionNegotiator.changeState(AbstractNetconfSessionNegotiator.java:221) ~[bundleFile:?]
at org.opendaylight.netconf.nettyutil.AbstractNetconfSessionNegotiator.getSessionForHelloMessage(AbstractNetconfSessionNegotiator.java:160) ~[bundleFile:?]
at org.opendaylight.netconf.client.NetconfClientSessionNegotiator.handleMessage(NetconfClientSessionNegotiator.java:75) ~[bundleFile:?]
at org.opendaylight.netconf.nettyutil.AbstractNetconfSessionNegotiator.channelRead(AbstractNetconfSessionNegotiator.java:313) [bundleFile:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [bundleFile:4.1.76.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [bundleFile:4.1.76.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [bundleFile:4.1.76.Final]
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:327) [bundleFile:4.1.76.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:299) [bundleFile:4.1.76.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [bundleFile:4.1.76.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [bundleFile:4.1.76.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [bundleFile:4.1.76.Final]
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:327) [bundleFile:4.1.76.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:299) [bundleFile:4.1.76.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [bundleFile:4.1.76.Final]
at io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:61) [bundleFile:4.1.76.Final]
at io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:370) [bundleFile:4.1.76.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [bundleFile:4.1.76.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) [bundleFile:4.1.76.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503) [bundleFile:4.1.76.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) [bundleFile:4.1.76.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [bundleFile:4.1.76.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [bundleFile:4.1.76.Final]
at java.lang.Thread.run(Thread.java:833) 
Comment by Robert Varga [ 13/Oct/22 ]

So this really is the cause: we are throwing an ISE in a context where we should either have seized working or report a documented exception. This is indicative a race – i.e. we are dispatching a message on the channel and while we are doing that we seem to have timed out.

Generated at Wed Feb 07 20:16:00 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.