[BGPCEP-915] Controller-exabgp session fails in BGP policy test Created: 11/Jul/20  Updated: 29/Jul/20  Resolved: 29/Jul/20

Status: Resolved
Project: bgpcep
Component/s: BGP
Affects Version/s: None
Fix Version/s: Aluminium

Type: Bug Priority: Medium
Reporter: Luis Gomez Assignee: Robert Varga
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

The issue is only happening in Aluminium and it is tracked here: https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/bgpcep-csit-1node-userfeatures-all-aluminium/130/robot-plugin/log.html.gz

When exabgp tries to connect, the controller closes the connection:

01:53:10 | 15011  | reactor         | new peer: neighbor 10.30.170.37 local-ip 127.0.0.1 local-as 64496 peer-as 64496 router-id 127.0.0.1 family-allowed in-open
01:53:10 | 15011  | reactor         | loaded new configuration successfully
01:53:10 | 15011  | reactor         | initialising connection to peer-1
01:53:10 | 15011  | outgoing-1      | attempting connection to 10.30.170.37:1790
01:53:10 | 15011  | outgoing-1      | sending TCP payload ( 177) FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF 00B1 0104 FBF0 00B4 7F00 0001 9402 0601 0400 0100 0102 0601 0400 0100 0202 0601 0400 0100 0402 0601 0400 0100 8002 0601 0400 0100 8402 0601 0400 0100 8502 0601 0400 0100 8602 0601 0400 0200 0102 0601 0400 0200 0202 0601 0400 0200 0402 0601 0400 0200 8002 0601 0400 0200 8502 0601 0400 0200 8602 0601 0400 1900 4102 0601 0400 1900 4602 0601 0440 0400 4702 0601 0440 0400 4802 0206 0002 0641 0400 00FB F0
01:53:10 | 15011  | outgoing-1      | >> OPEN version=4 asn=64496 hold_time=180 router_id=127.0.0.1 capabilities=[Multiprotocol(ipv4 unicast,ipv4 multicast,ipv4 nlri-mpls,ipv4 mpls-vpn,ipv4 rtc,ipv4 flow,ipv4 flow-vpn,ipv6 unicast,ipv6 multicast,ipv6 nlri-mpls,ipv6 mpls-vpn,ipv6 flow,ipv6 flow-vpn,l2vpn vpls,l2vpn evpn,bgp-ls bgp-ls,bgp-ls bgp-ls-vpn), Extended Message(65535), ASN4(64496)]
01:53:10 | 15011  | outgoing-1      | received TCP payload (  19) FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF 0077 01
01:53:10 | 15011  | outgoing-1      | received TCP payload ( 100) 04FB F000 B4C0 0002 025A 0258 4104 0000 FBF0 0600 0200 0104 0019 0046 0104 0001 0080 0104 0002 0004 0104 4004 0047 0104 0002 0086 0104 0001 0004 0104 0001 0086 0104 0002 0001 0104 0001 0001 0104 0001 0085 0104 0002 0080 0104 0002 0085 4002 00B4 4700
01:53:10 | 15011  | outgoing-1      | << message of type OPEN
01:53:10 | 15011  | outgoing-1      | << OPEN version=4 asn=64496 hold_time=180 router_id=192.0.2.2 capabilities=[Multiprotocol(l2vpn evpn,ipv4 mpls-vpn,ipv6 nlri-mpls,bgp-ls bgp-ls,ipv6 flow-vpn,ipv4 nlri-mpls,ipv4 flow-vpn,ipv6 unicast,ipv4 unicast,ipv4 flow,ipv6 mpls-vpn,ipv6 flow), Route Refresh, Extended Message(65535), Graceful Restart Flags 0x0 Time 180 , ASN4(64496), Unassigned 71]
01:53:10 | 15011  | ka-outgoing-1   | receive-timer 60 second(s) left
01:53:10 | 15011  | outgoing-1      | --------------------------------------------------------------------
01:53:10 | 15011  | outgoing-1      | the connection can not carry the following family/families
01:53:10 | 15011  | outgoing-1      |  - peer is not configured for ipv4/multicast
01:53:10 | 15011  | outgoing-1      |  - peer is not configured for bgp-ls/bgp-ls-vpn
01:53:10 | 15011  | outgoing-1      |  - peer is not configured for ipv4/rtc
01:53:10 | 15011  | outgoing-1      |  - peer is not configured for l2vpn/vpls
01:53:10 | 15011  | outgoing-1      |  - peer is not configured for ipv6/multicast
01:53:10 | 15011  | outgoing-1      | therefore no routes of this kind can be announced on the connection
01:53:10 | 15011  | outgoing-1      | --------------------------------------------------------------------
01:53:10 | 15011  | outgoing-1      | sending TCP payload (  19) FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF 0013 04
01:53:10 | 15011  | outgoing-1      | >> KEEPALIVE (OPENCONFIRM)
01:53:10 | 15011  | outgoing-1      | outgoing-1 127.0.0.1-10.30.170.37, closing connection

In the karaf log we can see this ERROR:

2020-07-10T01:53:09,979 | WARN  | epollEventLoopGroup-8-7 | AbstractBGPSessionNegotiator     | 240 - org.opendaylight.bgpcep.bgp-rib-impl - 0.14.0.SNAPSHOT | Channel [id: 0xd11bec1e, L:/10.30.170.37:1790 ! R:/127.0.0.2:44564] state OPEN_SENT unexpected message Keepalive{augmentation=[]}
2020-07-10T01:53:09,979 | WARN  | epollEventLoopGroup-8-7 | AbstractBGPSessionNegotiator     | 240 - org.opendaylight.bgpcep.bgp-rib-impl - 0.14.0.SNAPSHOT | Failed to send message Notify{_errorCode=5, _errorSubcode=0, augmentation=[]} to channel [id: 0xd11bec1e, L:/10.30.170.37:1790 ! R:/127.0.0.2:44564]
java.nio.channels.ClosedChannelException: null
	at io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:957) ~[bundleFile:4.1.49.Final]
	at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:865) ~[bundleFile:4.1.49.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367) ~[bundleFile:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717) ~[bundleFile:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:709) ~[bundleFile:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:792) ~[bundleFile:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:702) ~[bundleFile:4.1.49.Final]
	at io.netty.handler.codec.MessageToByteEncoder.write(MessageToByteEncoder.java:113) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717) ~[bundleFile:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764) ~[bundleFile:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:790) ~[bundleFile:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:758) ~[bundleFile:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:808) ~[bundleFile:4.1.49.Final]
	at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1025) ~[bundleFile:4.1.49.Final]
	at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:294) ~[bundleFile:4.1.49.Final]
	at org.opendaylight.protocol.bgp.rib.impl.AbstractBGPSessionNegotiator.sendMessage(AbstractBGPSessionNegotiator.java:273) [bundleFile:?]
	at org.opendaylight.protocol.bgp.rib.impl.AbstractBGPSessionNegotiator.handleMessage(AbstractBGPSessionNegotiator.java:181) [bundleFile:?]
	at org.opendaylight.protocol.bgp.rib.impl.AbstractBGPSessionNegotiator.channelRead(AbstractBGPSessionNegotiator.java:295) [bundleFile:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [bundleFile:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [bundleFile:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [bundleFile:4.1.49.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324) [bundleFile:4.1.49.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296) [bundleFile:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [bundleFile:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [bundleFile:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [bundleFile:4.1.49.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324) [bundleFile:4.1.49.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296) [bundleFile:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [bundleFile:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [bundleFile:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [bundleFile:4.1.49.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [bundleFile:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [bundleFile:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [bundleFile:4.1.49.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [bundleFile:4.1.49.Final]
	at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:792) [bundleFile:4.1.49.Final]
	at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:475) [bundleFile:4.1.49.Final]
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) [bundleFile:4.1.49.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [bundleFile:4.1.49.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [bundleFile:4.1.49.Final]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [bundleFile:4.1.49.Final]
	at java.lang.Thread.run(Thread.java:834) [?:?]


 Comments   
Comment by Robert Varga [ 22/Jul/20 ]

To qualify this a bit more, the failure looks like this:

2020-07-10T01:53:08,564 | INFO  | pipe-log:log "ROBOT MESSAGE: Starting test bgpcep-userfeatures.txt.Bgp Policies Default.Start_Exabgps" | core                             | 123 - org.apache.karaf.log.core - 4.2.8 | ROBOT MESSAGE: Starting test bgpcep-userfeatures.txt.Bgp Policies Default.Start_Exabgps
2020-07-10T01:53:09,809 | INFO  | pipe-log:log "ROBOT MESSAGE: Starting test bgpcep-userfeatures.txt.Bgp Policies Default.Verify_Rib_Filled" | core                             | 123 - org.apache.karaf.log.core - 4.2.8 | ROBOT MESSAGE: Starting test bgpcep-userfeatures.txt.Bgp Policies Default.Verify_Rib_Filled
2020-07-10T01:53:09,976 | INFO  | epollEventLoopGroup-8-7 | MultiprotocolCapabilitiesUtil    | 236 - org.opendaylight.bgpcep.bgp-parser-spi - 0.14.0.SNAPSHOT | Unsupported SAFI 2 parsed.
2020-07-10T01:53:09,977 | INFO  | epollEventLoopGroup-8-7 | CapabilityParameterParser        | 235 - org.opendaylight.bgpcep.bgp-parser-impl - 0.14.0.SNAPSHOT | Ignoring unsupported capability 1
2020-07-10T01:53:09,977 | INFO  | epollEventLoopGroup-8-7 | MultiprotocolCapabilitiesUtil    | 236 - org.opendaylight.bgpcep.bgp-parser-spi - 0.14.0.SNAPSHOT | Unsupported SAFI 2 parsed.
2020-07-10T01:53:09,977 | INFO  | epollEventLoopGroup-8-7 | CapabilityParameterParser        | 235 - org.opendaylight.bgpcep.bgp-parser-impl - 0.14.0.SNAPSHOT | Ignoring unsupported capability 1
2020-07-10T01:53:09,978 | INFO  | epollEventLoopGroup-8-7 | MultiprotocolCapabilitiesUtil    | 236 - org.opendaylight.bgpcep.bgp-parser-spi - 0.14.0.SNAPSHOT | Unsupported SAFI 65 parsed.
2020-07-10T01:53:09,978 | INFO  | epollEventLoopGroup-8-7 | CapabilityParameterParser        | 235 - org.opendaylight.bgpcep.bgp-parser-impl - 0.14.0.SNAPSHOT | Ignoring unsupported capability 1
2020-07-10T01:53:09,978 | INFO  | epollEventLoopGroup-8-7 | MultiprotocolCapabilitiesUtil    | 236 - org.opendaylight.bgpcep.bgp-parser-spi - 0.14.0.SNAPSHOT | Unsupported SAFI 72 parsed.
2020-07-10T01:53:09,978 | INFO  | epollEventLoopGroup-8-7 | CapabilityParameterParser        | 235 - org.opendaylight.bgpcep.bgp-parser-impl - 0.14.0.SNAPSHOT | Ignoring unsupported capability 1
2020-07-10T01:53:09,979 | WARN  | epollEventLoopGroup-8-7 | AbstractBGPSessionNegotiator     | 240 - org.opendaylight.bgpcep.bgp-rib-impl - 0.14.0.SNAPSHOT | Channel [id: 0xd11bec1e, L:/10.30.170.37:1790 ! R:/127.0.0.2:44564] state OPEN_SENT unexpected message Keepalive{augmentation=[]}
2020-07-10T01:53:09,979 | WARN  | epollEventLoopGroup-8-7 | AbstractBGPSessionNegotiator     | 240 - org.opendaylight.bgpcep.bgp-rib-impl - 0.14.0.SNAPSHOT | Failed to send message Notify{_errorCode=5, _errorSubcode=0, augmentation=[]} to channel [id: 0xd11bec1e, L:/10.30.170.37:1790 ! R:/127.0.0.2:44564]
java.nio.channels.ClosedChannelException: null

whereas the same test case succeeding is:

2020-07-18T02:28:16,038 | INFO  | pipe-log:log "ROBOT MESSAGE: Starting suite /w/workspace/bgpcep-csit-1node-userfeatures-all-magnesium/test/csit/suites/bgpcep/bgpfunct/bgp_policies_default.robot" | core                             | 124 - org.apache.karaf.log.core - 4.2.6 | ROBOT MESSAGE: Starting test bgpcep-userfeatures.txt.Bgp Policies Default.Start_Exabgps
2020-07-18T02:28:16,362 | INFO  | pipe-log:log "ROBOT MESSAGE: Starting suite /w/workspace/bgpcep-csit-1node-userfeatures-all-magnesium/test/csit/suites/bgpcep/bgpfunct/bgp_policies_default.robot" | core                             | 124 - org.apache.karaf.log.core - 4.2.6 | ROBOT MESSAGE: Starting test bgpcep-userfeatures.txt.Bgp Policies Default.Verify_Rib_Filled
2020-07-18T02:28:16,408 | INFO  | epollEventLoopGroup-8-4 | MultiprotocolCapabilitiesUtil    | 235 - org.opendaylight.bgpcep.bgp-parser-spi - 0.13.2.SNAPSHOT | Unsupported SAFI 2 parsed.
2020-07-18T02:28:16,409 | INFO  | epollEventLoopGroup-8-4 | CapabilityParameterParser        | 234 - org.opendaylight.bgpcep.bgp-parser-impl - 0.13.2.SNAPSHOT | Ignoring unsupported capability 1
2020-07-18T02:28:16,410 | INFO  | epollEventLoopGroup-8-4 | MultiprotocolCapabilitiesUtil    | 235 - org.opendaylight.bgpcep.bgp-parser-spi - 0.13.2.SNAPSHOT | Unsupported SAFI 2 parsed.
2020-07-18T02:28:16,410 | INFO  | epollEventLoopGroup-8-4 | CapabilityParameterParser        | 234 - org.opendaylight.bgpcep.bgp-parser-impl - 0.13.2.SNAPSHOT | Ignoring unsupported capability 1
2020-07-18T02:28:16,410 | INFO  | epollEventLoopGroup-8-4 | MultiprotocolCapabilitiesUtil    | 235 - org.opendaylight.bgpcep.bgp-parser-spi - 0.13.2.SNAPSHOT | Unsupported SAFI 65 parsed.
2020-07-18T02:28:16,410 | INFO  | epollEventLoopGroup-8-4 | CapabilityParameterParser        | 234 - org.opendaylight.bgpcep.bgp-parser-impl - 0.13.2.SNAPSHOT | Ignoring unsupported capability 1
2020-07-18T02:28:16,410 | INFO  | epollEventLoopGroup-8-4 | MultiprotocolCapabilitiesUtil    | 235 - org.opendaylight.bgpcep.bgp-parser-spi - 0.13.2.SNAPSHOT | Unsupported SAFI 72 parsed.
2020-07-18T02:28:16,410 | INFO  | epollEventLoopGroup-8-4 | CapabilityParameterParser        | 234 - org.opendaylight.bgpcep.bgp-parser-impl - 0.13.2.SNAPSHOT | Ignoring unsupported capability 1
2020-07-18T02:28:16,411 | INFO  | epollEventLoopGroup-8-4 | StrictBGPPeerRegistry            | 239 - org.opendaylight.bgpcep.bgp-rib-impl - 0.13.2.SNAPSHOT | BGP Open message session parameters differ, session still accepted.
2020-07-18T02:28:16,411 | INFO  | epollEventLoopGroup-8-4 | BGPSessionImpl                   | 239 - org.opendaylight.bgpcep.bgp-rib-impl - 0.13.2.SNAPSHOT | BGP HoldTimer new value: 180
2020-07-18T02:28:16,421 | INFO  | epollEventLoopGroup-8-4 | BGPPeer                          | 239 - org.opendaylight.bgpcep.bgp-rib-impl - 0.13.2.SNAPSHOT | Session with peer 127.0.0.1 went up with tables [BgpTableTypeImpl [getAfi()=interface 

which looks like a policy thing – i.e. instead of sending an ack, we seem to be in the failure territory. Furthermore it looks like the channel is closed. We then receive the keepalive, and fail to send the FSM error.

Generated at Wed Feb 07 19:14:28 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.