[BGPCEP-328] Immediate reconnects by BGPReconnectPromise Created: 08/Dec/15  Updated: 03/Mar/19  Resolved: 28/Jan/16

Status: Resolved
Project: bgpcep
Component/s: BGP
Affects Version/s: Bugzilla Migration
Fix Version/s: Bugzilla Migration

Type: Bug
Reporter: Vratko Polak Assignee: Milos Fabian
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 4740

 Description   

This Beryllium bug typically happens when ODL starts with BGP peer configured (in 41-*.xml, with small holdtimer) and ready. ODL ends up in a loop rapidly reconnecting, which pollutes karaf.log and makes debugging unrelated bugs very hard.
Eventually, connect is successful and system test passes, but karaf.log gets full before that.

BGPProtocolSessionPromise and BGPReconnectPromise should be reworked in such a way that no failure leads to immediate reconnect.

Here is a segment of karaf.log showing start of the loop:

2015-12-08 12:28:00,972 | TRACE | oupCloseable-6-2 | BGPMessageToByteEncoder | 252 - org.opendaylight.bgpcep.bgp-rib-impl - 0.5.0.SNAPSHOT | Encoding message: Notify [_errorCode=6, _errorSubcode=0, augmentation=[]]
2015-12-08 12:28:00,973 | TRACE | oupCloseable-6-2 | BGPNotificationMessageParser | 244 - org.opendaylight.bgpcep.bgp-parser-impl - 0.5.0.SNAPSHOT | Notification message serialized to: 0600
2015-12-08 12:28:00,973 | INFO | CommitFutures-0 | BGPSessionImpl | 252 - org.opendaylight.bgpcep.bgp-rib-impl - 0.5.0.SNAPSHOT | Closing session: BGPSessionImpl

{channel=[id: 0xe2c1bbe5, /10.25.2.9:45622 => /10.25.2.27:179], state=IDLE}

2015-12-08 12:28:00,973 | TRACE | oupCloseable-6-2 | BGPMessageToByteEncoder | 252 - org.opendaylight.bgpcep.bgp-rib-impl - 0.5.0.SNAPSHOT | Encoded message: ffffffffffffffffffffffffffffffff0015030600
2015-12-08 12:28:00,973 | DEBUG | oupCloseable-6-2 | BGPMessageToByteEncoder | 252 - org.opendaylight.bgpcep.bgp-rib-impl - 0.5.0.SNAPSHOT | Message sent to output: Notify [_errorCode=6, _errorSubcode=0, augmentation=[]]
2015-12-08 12:28:00,977 | DEBUG | CommitFutures-0 | BGPPeer | 252 - org.opendaylight.bgpcep.bgp-rib-impl - 0.5.0.SNAPSHOT | Transaction chain org.opendaylight.controller.cluster.databroker.DOMBrokerTransactionChain@5b6eb68f successfull.
2015-12-08 12:28:00,977 | TRACE | oupCloseable-6-2 | BGPSessionImpl | 252 - org.opendaylight.bgpcep.bgp-rib-impl - 0.5.0.SNAPSHOT | Message Notify [_errorCode=6, _errorSubcode=0, augmentation=[]] sent to socket [id: 0xe2c1bbe5, /10.25.2.9:45622 => /10.25.2.27:179]
2015-12-08 12:28:00,982 | TRACE | oupCloseable-6-2 | BGPByteToMessageDecoder | 252 - org.opendaylight.bgpcep.bgp-rib-impl - 0.5.0.SNAPSHOT | No more content in incoming buffer.
2015-12-08 12:28:00,983 | DEBUG | oupCloseable-6-2 | BGPSessionImpl | 252 - org.opendaylight.bgpcep.bgp-rib-impl - 0.5.0.SNAPSHOT | Channel [id: 0xe2c1bbe5, /10.25.2.9:45622 :> /10.25.2.27:179] inactive.
2015-12-08 12:28:00,983 | DEBUG | oupCloseable-6-2 | BGPReconnectPromise | 252 - org.opendaylight.bgpcep.bgp-rib-impl - 0.5.0.SNAPSHOT | Reconnecting after connection to /10.25.2.27:179 was dropped
2015-12-08 12:28:00,985 | DEBUG | oupCloseable-6-2 | BGPProtocolSessionPromise | 252 - org.opendaylight.bgpcep.bgp-rib-impl - 0.5.0.SNAPSHOT | Promise BGPProtocolSessionPromise@1fbce2ac(incomplete) attempting connect for 5000ms
2015-12-08 12:28:00,989 | DEBUG | oupCloseable-6-2 | BGPReconnectPromise | 252 - org.opendaylight.bgpcep.bgp-rib-impl - 0.5.0.SNAPSHOT | Client created.
2015-12-08 12:28:00,997 | DEBUG | oupCloseable-6-5 | BGPProtocolSessionPromise | 252 - org.opendaylight.bgpcep.bgp-rib-impl - 0.5.0.SNAPSHOT | Promise BGPProtocolSessionPromise@1fbce2ac(incomplete) connection resolved
2015-12-08 12:28:00,998 | DEBUG | oupCloseable-6-5 | BGPProtocolSessionPromise | 252 - org.opendaylight.bgpcep.bgp-rib-impl - 0.5.0.SNAPSHOT | Promise BGPProtocolSessionPromise@1fbce2ac(incomplete) connection successful
2015-12-08 12:28:00,999 | DEBUG | oupCloseable-6-5 | AbstractBGPSessionNegotiator | 252 - org.opendaylight.bgpcep.bgp-rib-impl - 0.5.0.SNAPSHOT | Starting session negotiation on channel [id: 0x2ea5cd45, /10.25.2.9:45623 => /10.25.2.27:179]

Subsequent iterations are caused like this:

2015-12-08 12:28:01,007 | INFO | oupCloseable-6-5 | AbstractBGPSessionNegotiator | 252 - org.opendaylight.bgpcep.bgp-rib-impl - 0.5.0.SNAPSHOT | Failed to send message Open [<snip>]
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)[:1.7.0_67]
at sun.nio.ch.SocketDispatcher.write(Unknown Source)[:1.7.0_67]
at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)[:1.7.0_67]
at sun.nio.ch.IOUtil.write(Unknown Source)[:1.7.0_67]
at sun.nio.ch.SocketChannelImpl.write(Unknown Source)[:1.7.0_67]
at io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:287)[124:io.netty.transport:4.0.30.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:761)[124:io.netty.transport:4.0.30.Final]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:311)[124:io.netty.transport:4.0.30.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:729)[124:io.netty.transport:4.0.30.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1127)[124:io.netty.transport:4.0.30.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:663)[124:io.netty.transport:4.0.30.Final]
at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:644)[124:io.netty.transport:4.0.30.Final]
at io.netty.channel.ChannelOutboundHandlerAdapter.flush(ChannelOutboundHandlerAdapter.java:115)[124:io.netty.transport:4.0.30.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:663)[124:io.netty.transport:4.0.30.Final]
at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:693)[124:io.netty.transport:4.0.30.Final]
at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:681)[124:io.netty.transport:4.0.30.Final]
at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:716)[124:io.netty.transport:4.0.30.Final]
at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:954)[124:io.netty.transport:4.0.30.Final]
at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:244)[124:io.netty.transport:4.0.30.Final]
at org.opendaylight.protocol.bgp.rib.impl.AbstractBGPSessionNegotiator.sendMessage(AbstractBGPSessionNegotiator.java:245)[252:org.opendaylight.bgpcep.bgp-rib-impl:0.5.0.SNAPSHOT]
at org.opendaylight.protocol.bgp.rib.impl.AbstractBGPSessionNegotiator.startNegotiation(AbstractBGPSessionNegotiator.java:111)[252:org.opendaylight.bgpcep.bgp-rib-impl:0.5.0.SNAPSHOT]
at org.opendaylight.protocol.bgp.rib.impl.AbstractBGPSessionNegotiator.channelActive(AbstractBGPSessionNegotiator.java:264)[252:org.opendaylight.bgpcep.bgp-rib-impl:0.5.0.SNAPSHOT]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:183)[124:io.netty.transport:4.0.30.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelActive(AbstractChannelHandlerContext.java:169)[124:io.netty.transport:4.0.30.Final]
at io.netty.channel.ChannelInboundHandlerAdapter.channelActive(ChannelInboundHandlerAdapter.java:64)[124:io.netty.transport:4.0.30.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:183)[124:io.netty.transport:4.0.30.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelActive(AbstractChannelHandlerContext.java:169)[124:io.netty.transport:4.0.30.Final]
at io.netty.channel.ChannelInboundHandlerAdapter.channelActive(ChannelInboundHandlerAdapter.java:64)[124:io.netty.transport:4.0.30.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:183)[124:io.netty.transport:4.0.30.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelActive(AbstractChannelHandlerContext.java:169)[124:io.netty.transport:4.0.30.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelActive(DefaultChannelPipeline.java:817)[124:io.netty.transport:4.0.30.Final]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:260)[124:io.netty.transport:4.0.30.Final]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:290)[124:io.netty.transport:4.0.30.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)[124:io.netty.transport:4.0.30.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)[124:io.netty.transport:4.0.30.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)[124:io.netty.transport:4.0.30.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)[124:io.netty.transport:4.0.30.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:110)[123:io.netty.common:4.0.30.Final]
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)[123:io.netty.common:4.0.30.Final]
at java.lang.Thread.run(Unknown Source)[:1.7.0_67]
2015-12-08 12:28:01,009 | DEBUG | oupCloseable-6-5 | AbstractBGPSessionNegotiator | 252 - org.opendaylight.bgpcep.bgp-rib-impl - 0.5.0.SNAPSHOT | Negotiation on channel [id: 0x2ea5cd45, /10.25.2.9:45623 :> /10.25.2.27:179] failed
java.io.IOException: Connection reset by peer
... (the same stacktrace as above)



 Comments   
Comment by Iveta Halanova [ 05/Jan/16 ]

connected to this test suit:
http://sn-bldsg.cisco.com/jenkins/job/test_beryllium_repacked_bgpcep07/

Comment by Iveta Halanova [ 05/Jan/16 ]

Now there's issue with PCEP session, but of similar nature as was for BGP - redundant negotiation thread probably because of low hold timer.

Comment by Milos Fabian [ 26/Jan/16 ]

Not exactly the same, but similar repeatedly appearing in the log file when
starting with 5+ features (including bgp one)

2016-01-25 22:13:49,659 | ERROR | oupCloseable-2-7 | BGPDocumentedException | 99 - org.opendaylight.bgpcep.bgp-parser-api - 0.5.0.SNAPSHOT | Error = FSM_ERROR
org.opendaylight.protocol.bgp.parser.BGPDocumentedException: HoldTimer expired
at org.opendaylight.protocol.bgp.rib.impl.AbstractBGPSessionNegotiator$1.run(AbstractBGPSessionNegotiator.java:121)[184:org.opendaylight.bgpcep.bgp-rib-impl:0.5.0.SNAPSHOT]
at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)[90:io.netty.common:4.0.33.Final]
at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120)[90:io.netty.common:4.0.33.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)[90:io.netty.common:4.0.33.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)[91:io.netty.transport:4.0.33.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)[90:io.netty.common:4.0.33.Final]
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)[90:io.netty.common:4.0.33.Final]
at java.lang.Thread.run(Unknown Source)[:1.7.0_67]
2016-01-25 22:13:49,659 | WARN | oupCloseable-2-7 | AbstractBGPSessionNegotiator | 184 - org.opendaylight.bgpcep.bgp-rib-impl - 0.5.0.SNAPSHOT | Channel [id: 0xc5985893, 0.0.0.0/0.0.0.0:41958 :> /10.25.2.27:179] negotiation failed: HoldTimer expired
2016-01-25 22:13:49,659 | WARN | oupCloseable-2-7 | AbstractBGPSessionNegotiator | 184 - org.opendaylight.bgpcep.bgp-rib-impl - 0.5.0.SNAPSHOT | Failed to send message Notify [_errorCode=5, _errorSubcode=0, augmentation=[]]
java.nio.channels.ClosedChannelException
2016-01-25 22:13:49,659 | WARN | oupCloseable-2-7 | DefaultPromise | 90 - io.netty.common - 4.0.33.Final | An exception was thrown by org.opendaylight.protocol.bgp.rib.impl.AbstractBGPSessionNegotiator$2.operationComplete()
java.lang.IllegalStateException: complete already: BGPProtocolSessionPromise@802d9d2(failure: java.io.IOException: Connection reset by peer)
at io.netty.util.concurrent.DefaultPromise.setFailure(DefaultPromise.java:418)[90:io.netty.common:4.0.33.Final]
at org.opendaylight.protocol.bgp.rib.impl.AbstractBGPSessionNegotiator.negotiationFailedCloseChannel(AbstractBGPSessionNegotiator.java:241)[184:org.opendaylight.bgpcep.bgp-rib-impl:0.5.0.SNAPSHOT]
at org.opendaylight.protocol.bgp.rib.impl.AbstractBGPSessionNegotiator.access$500(AbstractBGPSessionNegotiator.java:44)[184:org.opendaylight.bgpcep.bgp-rib-impl:0.5.0.SNAPSHOT]
at org.opendaylight.protocol.bgp.rib.impl.AbstractBGPSessionNegotiator$2.operationComplete(AbstractBGPSessionNegotiator.java:250)[184:org.opendaylight.bgpcep.bgp-rib-impl:0.5.0.SNAPSHOT]
at org.opendaylight.protocol.bgp.rib.impl.AbstractBGPSessionNegotiator$2.operationComplete(AbstractBGPSessionNegotiator.java:245)[184:org.opendaylight.bgpcep.bgp-rib-impl:0.5.0.SNAPSHOT]
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)[90:io.netty.common:4.0.33.Final]
at io.netty.util.concurrent.DefaultPromise.notifyLateListener(DefaultPromise.java:621)[90:io.netty.common:4.0.33.Final]
at io.netty.util.concurrent.DefaultPromise.addListener(DefaultPromise.java:138)[90:io.netty.common:4.0.33.Final]
at io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:93)[91:io.netty.transport:4.0.33.Final]
at io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:28)[91:io.netty.transport:4.0.33.Final]
at org.opendaylight.protocol.bgp.rib.impl.AbstractBGPSessionNegotiator.sendMessage(AbstractBGPSessionNegotiator.java:245)[184:org.opendaylight.bgpcep.bgp-rib-impl:0.5.0.SNAPSHOT]
at org.opendaylight.protocol.bgp.rib.impl.AbstractBGPSessionNegotiator.negotiationFailed(AbstractBGPSessionNegotiator.java:207)[184:org.opendaylight.bgpcep.bgp-rib-impl:0.5.0.SNAPSHOT]
at org.opendaylight.protocol.bgp.rib.impl.AbstractBGPSessionNegotiator.access$300(AbstractBGPSessionNegotiator.java:44)[184:org.opendaylight.bgpcep.bgp-rib-impl:0.5.0.SNAPSHOT]
at org.opendaylight.protocol.bgp.rib.impl.AbstractBGPSessionNegotiator$1.run(AbstractBGPSessionNegotiator.java:121)[184:org.opendaylight.bgpcep.bgp-rib-impl:0.5.0.SNAPSHOT]
at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)[90:io.netty.common:4.0.33.Final]
at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120)[90:io.netty.common:4.0.33.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)[90:io.netty.common:4.0.33.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)[91:io.netty.transport:4.0.33.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)[90:io.netty.common:4.0.33.Final]
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)[90:io.netty.common:4.0.33.Final]
at java.lang.Thread.run(Unknown Source)[:1.7.0_67]
Caused by: java.nio.channels.ClosedChannelException

After applying https://git.opendaylight.org/gerrit/#/c/31492/ the issue disappeared, or hold-timer expiration occurs only once.

Generated at Wed Feb 07 19:12:43 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.