[BGPCEP-221] "IllegalStateException: MpReachNlri codec not available" when pushing 10k routes or more Created: 12/May/15  Updated: 03/Mar/19  Resolved: 03/Jun/15

Status: Verified
Project: bgpcep
Component/s: BGP
Affects Version/s: Bugzilla Migration
Fix Version/s: Bugzilla Migration 1.0

Type: Bug
Reporter: Jozef Behran Assignee: Dana Kutenicsova
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 3186
Priority: High

 Description   

A "heisenbug" which occurs about 90% of the time. When pushing large amounts of routes, the BGP dies with this exception being logged for every update. The 90% figure I got reported for a case with 20 routes with linkstate; what I tested is a case with 10k routes without linkstate which hits the bug in 6 cases out of 7 or something like this. The problem seems to be less likely to occur when less routes are being pushed (with the probability approaching zero at around 2000 routes) and much more likely to occur when linkstate gets involved.

Steps to reproduce:
1. ODL_ROOT=<where_yourODL_installation_lives>
2. mkdir -p $ODL_ROOT/etc/opendaylight/karaf.
3. cp $ODL_ROOT/system/org/opendaylight/bgpcep/bgp-controller-config/*/bgp-controller-config-0.4.0-SNAPSHOT.xml $ODL_ROOT/etc/opendaylight/karaf/41-bgp-example.xml.
4. Uncomment the deactivated "single BFP peer" section in the just created file ($ODL_ROOT/etc/opendaylight/karaf/41-bgp-example.xml).
5. Boot ODL.
6. Install features "odl-restconf", "odl-bgppcep-bgp-all" and "odl-netconf-connector-all".
7. Wait for ODL to fully load (run "top" in another console and wait until CPU usage of the massive Java process stays below 5%).
8. Get the tool from https://git.opendaylight.org/gerrit/#/c/19603/3/test/tools/fastbgp/play.py
9. python play.py --gencount=10000

When the bug hits (if it does not, reboot ODL and try again), no routes will make it to RIB nor topology (use curl with the apropriate restconf URL to verify this) and the log file will contain a heavy load of exceptions like this:

2015-05-11 15:13:10,329 | WARN | oupCloseable-3-3 | DefaultChannelPipeline | 149 - io.netty.common - 4.0.26.Final | An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
java.lang.IllegalStateException: MpReachNlri codec not available
at com.google.common.base.Preconditions.checkState(Preconditions.java:173)[94:com.google.guava:18.0.0]
at org.opendaylight.protocol.bgp.rib.impl.RIBSupportContextImpl.serialiazeReachNlri(RIBSupportContextImpl.java:168)[259:org.opendaylight.bgpcep.bgp-rib-impl:0.4.0.SNAPSHOT]
at org.opendaylight.protocol.bgp.rib.impl.RIBSupportContextImpl.writeRoutes(RIBSupportContextImpl.java:133)[259:org.opendaylight.bgpcep.bgp-rib-impl:0.4.0.SNAPSHOT]
at org.opendaylight.protocol.bgp.rib.impl.TableContext.writeRoutes(TableContext.java:49)[259:org.opendaylight.bgpcep.bgp-rib-impl:0.4.0.SNAPSHOT]
at org.opendaylight.protocol.bgp.rib.impl.AdjRibInWriter.updateRoutes(AdjRibInWriter.java:229)[259:org.opendaylight.bgpcep.bgp-rib-impl:0.4.0.SNAPSHOT]
at org.opendaylight.protocol.bgp.rib.impl.BGPPeer.onMessage(BGPPeer.java:120)[259:org.opendaylight.bgpcep.bgp-rib-impl:0.4.0.SNAPSHOT]
at org.opendaylight.protocol.bgp.rib.impl.BGPPeer.onMessage(BGPPeer.java:65)[259:org.opendaylight.bgpcep.bgp-rib-impl:0.4.0.SNAPSHOT]
at org.opendaylight.protocol.bgp.rib.impl.BGPSessionImpl.handleMessage(BGPSessionImpl.java:217)[259:org.opendaylight.bgpcep.bgp-rib-impl:0.4.0.SNAPSHOT]
at org.opendaylight.protocol.bgp.rib.impl.BGPSessionImpl.handleMessage(BGPSessionImpl.java:53)[259:org.opendaylight.bgpcep.bgp-rib-impl:0.4.0.SNAPSHOT]
at org.opendaylight.protocol.framework.AbstractProtocolSession.channelRead0(AbstractProtocolSession.java:53)[151:org.opendaylight.controller.protocol-framework:0.6.0.SNAPSHOT]
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)[148:io.netty.transport:4.0.26.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)[148:io.netty.transport:4.0.26.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)[148:io.netty.transport:4.0.26.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)[174:io.netty.codec:4.0.26.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)[148:io.netty.transport:4.0.26.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)[148:io.netty.transport:4.0.26.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)[174:io.netty.codec:4.0.26.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)[148:io.netty.transport:4.0.26.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)[148:io.netty.transport:4.0.26.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)[148:io.netty.transport:4.0.26.Final]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)[148:io.netty.transport:4.0.26.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)[148:io.netty.transport:4.0.26.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)[148:io.netty.transport:4.0.26.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)[148:io.netty.transport:4.0.26.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)[148:io.netty.transport:4.0.26.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)[149:io.netty.common:4.0.26.Final]
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)[149:io.netty.common:4.0.26.Final]
at java.lang.Thread.run(Unknown Source)[:1.7.0_67]



 Comments   
Comment by Jozef Behran [ 13/May/15 ]

Possible fix for this problem, retesting needed:

https://git.opendaylight.org/gerrit/gitweb?p=bgpcep.git;a=commitdiff;h=e240cedc3bf07dfdd0206bab4bab9efce6bd1eec

Comment by Vratko Polak [ 14/May/15 ]

> Possible fix for this problem

The commit message calls it a workaround: https://git.opendaylight.org/gerrit/#/c/20015/1

Comment by Dana Kutenicsova [ 25/May/15 ]

https://git.opendaylight.org/gerrit/20994

Comment by Dana Kutenicsova [ 26/May/15 ]

https://git.opendaylight.org/gerrit/#/c/21085/

Generated at Wed Feb 07 19:12:23 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.