[NETCONF-827] Mount loop when setting too low connection-timeout Created: 02/Nov/21 Updated: 31/Oct/22 Resolved: 13/Oct/22 |
|
| Status: | Resolved |
| Project: | netconf |
| Component/s: | netconf |
| Affects Version/s: | 3.0.0, 4.0.0, 2.0.14, 2.0.16, 3.0.6, 4.0.2 |
| Fix Version/s: | 3.0.7, 4.0.3, 2.0.17 |
| Type: | Bug | Priority: | Medium |
| Reporter: | Jaroslav Tóth | Assignee: | Ivan Martiniak |
| Resolution: | Done | Votes: | 0 |
| Labels: | pt | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Description |
|
After setting of small value for connection-timeout:
{
"node": [
{
"node-id": "testtool",
"netconf-node-topology:host": "127.0.0.1",
"netconf-node-topology:port": 36000,
"netconf-node-topology:keepalive-delay": 10,
"netconf-node-topology:tcp-only": false,
"netconf-node-topology:username": "admin",
"netconf-node-topology:password": "admin",
"netconf-node-topology:connection-timeout-millis": 100
}
]
}
sometimes the race condition between processing of received NETCONF hello message from server and cancellation of connection attempt occurs; symptoms: 021-11-02T20:28:47,433 | WARN | nioEventLoopGroupCloseable-3-12 | AbstractNetconfSessionNegotiator | 274 - org.opendaylight.netconf.netty-util - 1.9.1 | An exception occurred during negotiation with nulljava.lang.IllegalStateException: complete already: NetconfSessionPromise@4b94e299(failure: java.lang.IllegalStateException: Cannot change state from FAILED to ESTABLISHED for chanel [id: 0xa5b25536]) at io.netty.util.concurrent.DefaultPromise.setSuccess(DefaultPromise.java:99) ~[bundleFile:4.1.52.Final] at org.opendaylight.netconf.nettyutil.NetconfSessionPromise.setSuccess(NetconfSessionPromise.java:79) ~[?:?] at org.opendaylight.netconf.nettyutil.NetconfSessionPromise.setSuccess(NetconfSessionPromise.java:27) ~[?:?] at org.opendaylight.netconf.nettyutil.AbstractNetconfSessionNegotiator.negotiationSuccessful(AbstractNetconfSessionNegotiator.java:284) ~[?:?] at org.opendaylight.netconf.client.NetconfClientSessionNegotiator.access$100(NetconfClientSessionNegotiator.java:43) ~[?:?] at org.opendaylight.netconf.client.NetconfClientSessionNegotiator$ExiConfirmationInboundHandler.channelRead(NetconfClientSessionNegotiator.java:216) ~[?:?] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [bundleFile:4.1.52.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [bundleFile:4.1.52.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [bundleFile:4.1.52.Final] at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324) [bundleFile:4.1.52.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296) [bundleFile:4.1.52.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [bundleFile:4.1.52.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [bundleFile:4.1.52.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [bundleFile:4.1.52.Final] at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324) [bundleFile:4.1.52.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296) [bundleFile:4.1.52.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [bundleFile:4.1.52.Final] at io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:61) [bundleFile:4.1.52.Final] at io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:370) [bundleFile:4.1.52.Final] at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [bundleFile:4.1.52.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) [bundleFile:4.1.52.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) [bundleFile:4.1.52.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [bundleFile:4.1.52.Final] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [bundleFile:4.1.52.Final] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [bundleFile:4.1.52.Final] at java.lang.Thread.run(Thread.java:829) [?:?] 2021-11-02T20:28:47,437 | WARN | nioEventLoopGroupCloseable-3-12 | AbstractChannelHandlerContext | 53 - io.netty.common - 4.1.52.Final | An exception 'java.lang.IllegalStateException: complete already: NetconfSessionPromise@4b94e299(failure: java.lang.IllegalStateException: Cannot change state from FAILED to ESTABLISHED for chanel [id: 0xa5b25536])' [enable DEBUG level for full stacktrace] was thrown by a user handler's exceptionCaught() method while handling the following exception:java.lang.IllegalStateException: complete already: NetconfSessionPromise@4b94e299(failure: java.lang.IllegalStateException: Cannot change state from FAILED to ESTABLISHED for chanel [id: 0xa5b25536]) at io.netty.util.concurrent.DefaultPromise.setSuccess(DefaultPromise.java:99) ~[bundleFile:4.1.52.Final] at org.opendaylight.netconf.nettyutil.NetconfSessionPromise.setSuccess(NetconfSessionPromise.java:79) ~[?:?] at org.opendaylight.netconf.nettyutil.NetconfSessionPromise.setSuccess(NetconfSessionPromise.java:27) ~[?:?] at org.opendaylight.netconf.nettyutil.AbstractNetconfSessionNegotiator.negotiationSuccessful(AbstractNetconfSessionNegotiator.java:284) ~[?:?] at org.opendaylight.netconf.client.NetconfClientSessionNegotiator.access$100(NetconfClientSessionNegotiator.java:43) ~[?:?] at org.opendaylight.netconf.client.NetconfClientSessionNegotiator$ExiConfirmationInboundHandler.channelRead(NetconfClientSessionNegotiator.java:216) ~[?:?] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [bundleFile:4.1.52.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [bundleFile:4.1.52.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [bundleFile:4.1.52.Final] at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324) [bundleFile:4.1.52.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296) [bundleFile:4.1.52.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [bundleFile:4.1.52.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [bundleFile:4.1.52.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [bundleFile:4.1.52.Final] at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324) [bundleFile:4.1.52.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296) [bundleFile:4.1.52.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [bundleFile:4.1.52.Final] at io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:61) [bundleFile:4.1.52.Final] at io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:370) [bundleFile:4.1.52.Final] at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [bundleFile:4.1.52.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) [bundleFile:4.1.52.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) [bundleFile:4.1.52.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [bundleFile:4.1.52.Final] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [bundleFile:4.1.52.Final] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [bundleFile:4.1.52.Final] at java.lang.Thread.run(Thread.java:829) [?:?] It is fine that it fails, however the issue comes when the next re-connection attempt is launched. Even if it is successful, it will again fail with the same/similar error that some session promise has already been set. See full logs with replicated bug (manually crafted buggy scenario by stopping execution using breakpoint in the NetconfClientSessionNegotiator#handleMessage): karaf.log
|
| Comments |
| Comment by Jaroslav Tóth [ 02/Nov/21 ] |
|
proposed workaround: |
| Comment by Robert Varga [ 14/Nov/21 ] |
|
The logs are from Aluminium SR1. The code has shifted significantly since then and hence this needs to be first reproduced. |
| Comment by Robert Varga [ 31/May/22 ] |
|
This issue seems to be replicated in |
| Comment by Robert Varga [ 31/May/22 ] |
|
Based on the refreshed logs, the failure recorded is a AbstractNetconfSessionNegotiator guard and we are attempting to transition from FAILED to ESTABLISHED state. This would indicate we are reusing a negotiatior across multiple attempts or we have a fail-than-succeed on the stack. |
| Comment by Sangwook Ha [ 28/Sep/22 ] |
|
This issue seems to be caused by the race between channelActive/channelRead & channel closure triggered by connection timeout - e.g. channelRead tries to change the state to ESTABLISHED right after timeoutExpired changed the state to FAILED and called channel.close(). |
| Comment by Ivan Martiniak [ 12/Oct/22 ] |
java.lang.IllegalStateException: Cannot change state from FAILED to ESTABLISHED for channel [id: 0x663f1d10] at com.google.common.base.Preconditions.checkState(Preconditions.java:840) ~[bundleFile:?] at org.opendaylight.netconf.nettyutil.AbstractNetconfSessionNegotiator.changeState(AbstractNetconfSessionNegotiator.java:221) ~[bundleFile:?] at org.opendaylight.netconf.nettyutil.AbstractNetconfSessionNegotiator.getSessionForHelloMessage(AbstractNetconfSessionNegotiator.java:160) ~[bundleFile:?] at org.opendaylight.netconf.client.NetconfClientSessionNegotiator.handleMessage(NetconfClientSessionNegotiator.java:75) ~[bundleFile:?] at org.opendaylight.netconf.nettyutil.AbstractNetconfSessionNegotiator.channelRead(AbstractNetconfSessionNegotiator.java:313) [bundleFile:?] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [bundleFile:4.1.76.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [bundleFile:4.1.76.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [bundleFile:4.1.76.Final] at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:327) [bundleFile:4.1.76.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:299) [bundleFile:4.1.76.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [bundleFile:4.1.76.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [bundleFile:4.1.76.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [bundleFile:4.1.76.Final] at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:327) [bundleFile:4.1.76.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:299) [bundleFile:4.1.76.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [bundleFile:4.1.76.Final] at io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:61) [bundleFile:4.1.76.Final] at io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:370) [bundleFile:4.1.76.Final] at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [bundleFile:4.1.76.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) [bundleFile:4.1.76.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503) [bundleFile:4.1.76.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) [bundleFile:4.1.76.Final] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [bundleFile:4.1.76.Final] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [bundleFile:4.1.76.Final] at java.lang.Thread.run(Thread.java:833) |
| Comment by Robert Varga [ 13/Oct/22 ] |
|
So this really is the cause: we are throwing an ISE in a context where we should either have seized working or report a documented exception. This is indicative a race – i.e. we are dispatching a message on the channel and while we are doing that we seem to have timed out. |