Uploaded image for project: 'bgpcep'
  1. bgpcep
  2. BGPCEP-1023

Race during peer shutdown

    XMLWordPrintable

Details

    • Bug
    • Status: Confirmed
    • Medium
    • Resolution: Unresolved
    • None
    • 0.19.9
    • BGP
    • None

    Description

      We have the following splat in CSIT:

      2023-04-11T01:32:00,588 | INFO  | epollEventLoopGroup-7-6 | BGPPeer                          | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.19.4 | Session with peer 127.0.0.3 went down
      2023-04-11T01:32:00,589 | INFO  | epollEventLoopGroup-7-6 | BGPPeer                          | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.19.4 | Closing session with peer
      2023-04-11T01:32:00,590 | INFO  | epollEventLoopGroup-7-6 | AbstractPeer                     | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.19.4 | Closed per Peer /(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)bgp-rib/rib/rib[{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)id=example-bgp-rib}]/peer/peer[{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)peer-id=bgp://127.0.0.3}] removed
      2023-04-11T01:32:00,591 | INFO  | epollEventLoopGroup-7-3 | AbstractPeer                     | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.19.4 | Closing peer chain Uri{value=bgp://127.0.0.2}
      2023-04-11T01:32:00,591 | INFO  | epollEventLoopGroup-7-3 | BGPSessionImpl                   | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.19.4 | Closing session: BGPSessionImpl{channel=[id: 0x75f3dae8, L:/10.30.170.109:1790 ! R:/127.0.0.2:39923], state=UP}
      2023-04-11T01:32:00,592 | WARN  | opendaylight-cluster-data-shard-dispatcher-27 | ShardDataTree                    | 284 - org.opendaylight.controller.sal-distributed-datastore - 7.0.4 | member-1-shard-default-operational: Store Tx member-1-datastore-operational-fe-0-chn-249-txn-2-0: Conflicting modification for path /(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)bgp-rib/rib/rib[{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)id=example-bgp-rib}]/peer/peer[{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)peer-id=bgp://127.0.0.3}].
      2023-04-11T01:32:00,601 | INFO  | epollEventLoopGroup-7-2 | BGPSessionImpl                   | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.19.4 | End of input detected. Close the session.
      2023-04-11T01:32:00,601 | INFO  | epollEventLoopGroup-7-2 | BGPPeer                          | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.19.4 | Session with peer 127.0.0.4 went down
      2023-04-11T01:32:00,601 | INFO  | epollEventLoopGroup-7-2 | BGPPeer                          | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.19.4 | Closing session with peer
      2023-04-11T01:32:00,602 | INFO  | epollEventLoopGroup-7-2 | AbstractPeer                     | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.19.4 | Closed per Peer /(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)bgp-rib/rib/rib[{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)id=example-bgp-rib}]/peer/peer[{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)peer-id=bgp://127.0.0.4}] removed
      2023-04-11T01:32:00,603 | INFO  | epollEventLoopGroup-7-2 | AbstractPeer                     | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.19.4 | Closing peer chain Uri{value=bgp://127.0.0.4}
      2023-04-11T01:32:00,603 | INFO  | epollEventLoopGroup-7-2 | BGPSessionImpl                   | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.19.4 | Closing session: BGPSessionImpl{channel=[id: 0x144dc834, L:/10.30.170.109:1790 ! R:/127.0.0.4:33429], state=UP}
      2023-04-11T01:32:00,602 | ERROR | epollEventLoopGroup-7-6 | AbstractPeer                     | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.19.4 | Write routes failed
      java.util.concurrent.ExecutionException: OptimisticLockFailedException{message=Optimistic lock failed for path /(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)bgp-rib/rib/rib[{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)id=example-bgp-rib}]/peer/peer[{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)peer-id=bgp://127.0.0.3}], errorList=[RpcError [message=Optimistic lock failed for path /(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)bgp-rib/rib/rib[{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)id=example-bgp-rib}]/peer/peer[{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)peer-id=bgp://127.0.0.3}], severity=ERROR, errorType=APPLICATION, tag=resource-denied, applicationTag=null, info=null, cause=org.opendaylight.yangtools.yang.data.tree.api.ConflictingModificationAppliedException: Node was deleted by other transaction.]]}
      	at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:588) ~[?:?]
      	at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:567) ~[?:?]
      	at com.google.common.util.concurrent.FluentFuture$TrustedFuture.get(FluentFuture.java:91) ~[?:?]
      	at org.opendaylight.protocol.bgp.rib.impl.AbstractPeer.releaseRibOutChain(AbstractPeer.java:490) ~[?:?]
      	at org.opendaylight.protocol.bgp.rib.impl.BGPPeer.releaseConnection(BGPPeer.java:576) ~[?:?]
      	at org.opendaylight.protocol.bgp.rib.impl.BGPPeer.releaseConnectionGracefully(BGPPeer.java:666) ~[?:?]
      	at org.opendaylight.protocol.bgp.rib.impl.BGPPeer.onSessionDown(BGPPeer.java:531) ~[?:?]
      	at org.opendaylight.protocol.bgp.rib.impl.BGPSessionImpl.endOfInput(BGPSessionImpl.java:335) ~[?:?]
      	at org.opendaylight.protocol.bgp.rib.impl.BGPSessionImpl.channelInactive(BGPSessionImpl.java:548) ~[?:?]
      	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:305) ~[?:?]
      	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281) ~[?:?]
      	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274) ~[?:?]
      	at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:411) ~[?:?]
      	at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:376) ~[?:?]
      	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:305) ~[?:?]
      	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281) ~[?:?]
      	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274) ~[?:?]
      	at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:411) ~[?:?]
      	at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:376) ~[?:?]
      	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:305) ~[?:?]
      	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281) ~[?:?]
      	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274) ~[?:?]
      	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405) ~[?:?]
      	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:301) ~[?:?]
      	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281) ~[?:?]
      	at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901) ~[?:?]
      	at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:813) ~[?:?]
      	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174) ~[?:?]
      	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167) ~[?:?]
      	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) ~[?:?]
      	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:403) ~[?:?]
      	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[?:?]
      	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
      	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[?:?]
      	at java.lang.Thread.run(Thread.java:833) ~[?:?]

      This seems to be an ordering issue in BGPPeer.releaseConnection(), in the not isRestartingGracefully() case:

      • we first terminateConnection(), which in turn runs removePeer() with AbstractPeer.domChain
      • we then releaseRibOutChain(), which in turn waits for transactions on AbstractPeer.ribOutChain

      Given these two are unsynchronized, the domChain transaction can jump in front of whatever is happening on ribOut.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            rovarga Robert Varga
            rovarga Robert Varga
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: