[NETCONF-830] Flood of "IllegalStateException: complete already" Created: 10/Nov/21 Updated: 18/Nov/21 Resolved: 18/Nov/21 |
|
| Status: | Resolved |
| Project: | netconf |
| Component/s: | netconf |
| Affects Version/s: | 1.13.4 |
| Fix Version/s: | 2.0.4, 1.13.6 |
| Type: | Bug | Priority: | Medium |
| Reporter: | Andrej Mak | Assignee: | Ivan Hrasko |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | pt | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
We use ODL netconf topology. It failed to establish connection to Netconf server because of folowing error. AbstractNetconfSessionNegotiator - Failed to send message <hello xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<capabilities>
<capability>urn:ietf:params:netconf:capability:exi:1.0</capability>
<capability>urn:ietf:params:netconf:base:1.1</capability>
<capability>urn:ietf:params:netconf:base:1.0</capability>
</capabilities>
</hello>
java.lang.IllegalStateException: Channel closed
at org.opendaylight.netconf.nettyutil.handler.ssh.client.AsyncSshHandlerWriter.write(AsyncSshHandlerWriter.java:64) ~[netconf-netty-util-1.13.4.jar:?]
at org.opendaylight.netconf.nettyutil.handler.ssh.client.AsyncSshHandler.write(AsyncSshHandler.java:188) ~[netconf-netty-util-1.13.4.jar:?]
Then logs were flooded with exceptions: [WARN ] 2021-11-09 17:09:36.562 [nioEventLoopGroup-3-1] AbstractChannelHandlerContext - An exception 'java.lang.IllegalStateException: complete already: NetconfSessionPromise@546106c5(failure: java.lang.IllegalStateException: Channel closed)' [enable DEBUG level for full
stacktrace] was thrown by a user handler's exceptionCaught() method while handling the following exception:
java.lang.IllegalStateException: complete already: NetconfSessionPromise@546106c5(failure: java.lang.IllegalStateException: Channel closed)
at io.netty.util.concurrent.DefaultPromise.setFailure(DefaultPromise.java:112) ~[netty-common-4.1.65.Final.jar:4.1.65.Final]
at org.opendaylight.netconf.nettyutil.AbstractNetconfSessionNegotiator.negotiationFailed(AbstractNetconfSessionNegotiator.java:290) ~[netconf-netty-util-1.13.4.jar:?]
at org.opendaylight.netconf.nettyutil.AbstractNetconfSessionNegotiator.channelRead(AbstractNetconfSessionNegotiator.java:330) ~[netconf-netty-util-1.13.4.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.65.Final.jar:4.1.65.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.65.Final.jar:4.1.65.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.65.Final.jar:4.1.65.Final]
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324) [netty-codec-4.1.65.Final.jar:4.1.65.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296) [netty-codec-4.1.65.Final.jar:4.1.65.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.65.Final.jar:4.1.65.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.65.Final.jar:4.1.65.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.65.Final.jar:4.1.65.Final]
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324) [netty-codec-4.1.65.Final.jar:4.1.65.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296) [netty-codec-4.1.65.Final.jar:4.1.65.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.65.Final.jar:4.1.65.Final]
at io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:61) [netty-transport-4.1.65.Final.jar:4.1.65.Final]
at io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:370) [netty-transport-4.1.65.Final.jar:4.1.65.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [netty-common-4.1.65.Final.jar:4.1.65.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) [netty-common-4.1.65.Final.jar:4.1.65.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) [netty-transport-4.1.65.Final.jar:4.1.65.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.65.Final.jar:4.1.65.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.65.Final.jar:4.1.65.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.65.Final.jar:4.1.65.Final]
at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: java.lang.IllegalStateException: complete already: NetconfSessionPromise@546106c5(failure: java.lang.IllegalStateException: Channel closed)
at io.netty.util.concurrent.DefaultPromise.setSuccess(DefaultPromise.java:99) ~[netty-common-4.1.65.Final.jar:4.1.65.Final]
at org.opendaylight.netconf.nettyutil.NetconfSessionPromise.setSuccess(NetconfSessionPromise.java:79) ~[netconf-netty-util-1.13.4.jar:?]
at org.opendaylight.netconf.nettyutil.NetconfSessionPromise.setSuccess(NetconfSessionPromise.java:27) ~[netconf-netty-util-1.13.4.jar:?]
at org.opendaylight.netconf.nettyutil.AbstractNetconfSessionNegotiator.negotiationSuccessful(AbstractNetconfSessionNegotiator.java:284) ~[netconf-netty-util-1.13.4.jar:?]
at org.opendaylight.netconf.client.NetconfClientSessionNegotiator.handleMessage(NetconfClientSessionNegotiator.java:93) ~[netconf-client-1.13.4.jar:?]
at org.opendaylight.netconf.nettyutil.AbstractNetconfSessionNegotiator.channelRead(AbstractNetconfSessionNegotiator.java:327) ~[netconf-netty-util-1.13.4.jar:?]
... 20 more
Nov 09, 2021 5:09:36 PM com.google.common.util.concurrent.AbstractFuture executeListener
SEVERE: RuntimeException while executing runnable CallbackListener{org.opendaylight.netconf.sal.connect.netconf.NetconfDevice$1@7230cd8c} with executor MoreExecutors.directExecutor()
java.lang.IllegalStateException: Already initialized
at com.google.common.base.Preconditions.checkState(Preconditions.java:508)
at org.opendaylight.netconf.sal.connect.netconf.sal.NetconfDeviceSalProvider$MountInstance.onTopologyDeviceConnected(NetconfDeviceSalProvider.java:127)
at org.opendaylight.netconf.sal.connect.netconf.sal.NetconfDeviceSalFacade.onDeviceConnected(NetconfDeviceSalFacade.java:87)
at org.opendaylight.netconf.sal.connect.netconf.sal.NetconfDeviceSalFacade.onDeviceConnected(NetconfDeviceSalFacade.java:43)
at org.opendaylight.netconf.sal.connect.netconf.NetconfDevice.handleSalInitializationSuccess(NetconfDevice.java:262)
at org.opendaylight.netconf.sal.connect.netconf.NetconfDevice$1.onSuccess(NetconfDevice.java:184)
at org.opendaylight.netconf.sal.connect.netconf.NetconfDevice$1.onSuccess(NetconfDevice.java:181)
at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1089)
at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1174)
at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:969)
at com.google.common.util.concurrent.AbstractFuture.setFuture(AbstractFuture.java:800)
at com.google.common.util.concurrent.AbstractTransformFuture$AsyncTransformFuture.setResult(AbstractTransformFuture.java:224)
at com.google.common.util.concurrent.AbstractTransformFuture$AsyncTransformFuture.setResult(AbstractTransformFuture.java:202)
at com.google.common.util.concurrent.AbstractTransformFuture.run(AbstractTransformFuture.java:163)
at com.google.common.util.concurrent.MoreExecutors$5$1.run(MoreExecutors.java:1091)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
It wasn't possible to make netconf connect again with deleting and creating mount point again. We checked, that netconf server is running and we were able to send netconf RPCs via ssh. |
| Comments |
| Comment by Ivan Hrasko [ 10/Nov/21 ] |
|
andmak Are there steps how to reproduce the issue? |
| Comment by Andrej Mak [ 10/Nov/21 ] |
|
There are no simple steps. Issue happens only on some of tens of devices. I think it is related to reconnecting - if something goes wrong, netconf tries to reconnect, but fails to update netconf mountpoint session state(java.lang.IllegalStateException: Already initialized). |
| Comment by Ivan Hrasko [ 10/Nov/21 ] |
|
Are you using clustered NETCONF topology with 3-node cluster? Is it possible that your leader node has changed during the process? |
| Comment by Ivan Hrasko [ 10/Nov/21 ] |
|
You write: It wasn't possible to make netconf connect again with deleting and creating mount point again. Have you also checked OPERATIONAL data? When you delete device from CONFIG data is it deleted from OPERATIONAL as well? When you create device in OPERATIONAL is it created in OPERATIONAL as well? |
| Comment by Robert Varga [ 10/Nov/21 ] |
|
andmak can you repro with netconf-2.0.8? |
| Comment by Andrej Mak [ 11/Nov/21 ] |
|
We don't use cluster. |
| Comment by Andrej Mak [ 11/Nov/21 ] |
|
No, I haven't checked operational data. Reconnect just deletes config and then waits until MountPoint listener says, that mountpoint was removed. Then it creates same neconf node in config datastore again. |
| Comment by Andrej Mak [ 11/Nov/21 ] |
|
I see, that fix version in linked issue is 2.0.4. Would testing with 2.0.5 be enough? |
| Comment by Ivan Hrasko [ 11/Nov/21 ] |
|
The latest, the better - but probably yes |
| Comment by Ivan Hrasko [ 15/Nov/21 ] |
|
I looks you are not the only one complaining on "complete already": https://git.opendaylight.org/gerrit/c/netconf/+/98273 |
| Comment by Ivan Hrasko [ 16/Nov/21 ] |
|
OK, this patch is misleading. |
| Comment by Ivan Hrasko [ 16/Nov/21 ] |
|
It is difficult to reproduce this issue but we can see that the problem with the first connection was on line: AsyncSshHandlerWriter.java:64 It means the following code: promise.setFailure(new IllegalStateException("Channel closed")); Thus it is possible to simulate the issue by injecting a code that will invoke the same code as above when we are trying to establish connection to the device for the first time. Really, it was possible to reproduce the issue with the version 1.13.4, please see attached: 1.13.4-reproduced.karaf.log Steps to reproduce are:
With the current master branch - version 2.0.10-SNAPSHOT it is possible to observe behavior as we expect:
For example the version 2.0.4 is working, too. |
| Comment by Ivan Hrasko [ 16/Nov/21 ] |
|
The cherry-picked code available here for |
| Comment by Ivan Hrasko [ 18/Nov/21 ] |
|
When the code available here is applied the problem is fixed. It helps even when it is applied alone without any other cherry-pick on 1.13.4 version. |
| Comment by Ivan Hrasko [ 18/Nov/21 ] |
|
It means that this issue can be fixed in the 1.13.6 version in Silicon SR4 and it is already fixed in the 2.0.4 version in Phosphorus-SR0. . |