Uploaded image for project: 'bgpcep'
  1. bgpcep
  2. BGPCEP-1023

Race during peer shutdown

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Medium Medium
    • 0.22.0
    • None
    • BGP
    • None

      We have the following splat in CSIT:

      2023-04-11T01:32:00,588 | INFO  | epollEventLoopGroup-7-6 | BGPPeer                          | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.19.4 | Session with peer 127.0.0.3 went down
      2023-04-11T01:32:00,589 | INFO  | epollEventLoopGroup-7-6 | BGPPeer                          | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.19.4 | Closing session with peer
      2023-04-11T01:32:00,590 | INFO  | epollEventLoopGroup-7-6 | AbstractPeer                     | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.19.4 | Closed per Peer /(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)bgp-rib/rib/rib[{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)id=example-bgp-rib}]/peer/peer[{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)peer-id=bgp://127.0.0.3}] removed
      2023-04-11T01:32:00,591 | INFO  | epollEventLoopGroup-7-3 | AbstractPeer                     | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.19.4 | Closing peer chain Uri{value=bgp://127.0.0.2}
      2023-04-11T01:32:00,591 | INFO  | epollEventLoopGroup-7-3 | BGPSessionImpl                   | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.19.4 | Closing session: BGPSessionImpl{channel=[id: 0x75f3dae8, L:/10.30.170.109:1790 ! R:/127.0.0.2:39923], state=UP}
      2023-04-11T01:32:00,592 | WARN  | opendaylight-cluster-data-shard-dispatcher-27 | ShardDataTree                    | 284 - org.opendaylight.controller.sal-distributed-datastore - 7.0.4 | member-1-shard-default-operational: Store Tx member-1-datastore-operational-fe-0-chn-249-txn-2-0: Conflicting modification for path /(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)bgp-rib/rib/rib[{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)id=example-bgp-rib}]/peer/peer[{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)peer-id=bgp://127.0.0.3}].
      2023-04-11T01:32:00,601 | INFO  | epollEventLoopGroup-7-2 | BGPSessionImpl                   | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.19.4 | End of input detected. Close the session.
      2023-04-11T01:32:00,601 | INFO  | epollEventLoopGroup-7-2 | BGPPeer                          | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.19.4 | Session with peer 127.0.0.4 went down
      2023-04-11T01:32:00,601 | INFO  | epollEventLoopGroup-7-2 | BGPPeer                          | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.19.4 | Closing session with peer
      2023-04-11T01:32:00,602 | INFO  | epollEventLoopGroup-7-2 | AbstractPeer                     | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.19.4 | Closed per Peer /(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)bgp-rib/rib/rib[{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)id=example-bgp-rib}]/peer/peer[{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)peer-id=bgp://127.0.0.4}] removed
      2023-04-11T01:32:00,603 | INFO  | epollEventLoopGroup-7-2 | AbstractPeer                     | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.19.4 | Closing peer chain Uri{value=bgp://127.0.0.4}
      2023-04-11T01:32:00,603 | INFO  | epollEventLoopGroup-7-2 | BGPSessionImpl                   | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.19.4 | Closing session: BGPSessionImpl{channel=[id: 0x144dc834, L:/10.30.170.109:1790 ! R:/127.0.0.4:33429], state=UP}
      2023-04-11T01:32:00,602 | ERROR | epollEventLoopGroup-7-6 | AbstractPeer                     | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.19.4 | Write routes failed
      java.util.concurrent.ExecutionException: OptimisticLockFailedException{message=Optimistic lock failed for path /(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)bgp-rib/rib/rib[{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)id=example-bgp-rib}]/peer/peer[{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)peer-id=bgp://127.0.0.3}], errorList=[RpcError [message=Optimistic lock failed for path /(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)bgp-rib/rib/rib[{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)id=example-bgp-rib}]/peer/peer[{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)peer-id=bgp://127.0.0.3}], severity=ERROR, errorType=APPLICATION, tag=resource-denied, applicationTag=null, info=null, cause=org.opendaylight.yangtools.yang.data.tree.api.ConflictingModificationAppliedException: Node was deleted by other transaction.]]}
      	at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:588) ~[?:?]
      	at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:567) ~[?:?]
      	at com.google.common.util.concurrent.FluentFuture$TrustedFuture.get(FluentFuture.java:91) ~[?:?]
      	at org.opendaylight.protocol.bgp.rib.impl.AbstractPeer.releaseRibOutChain(AbstractPeer.java:490) ~[?:?]
      	at org.opendaylight.protocol.bgp.rib.impl.BGPPeer.releaseConnection(BGPPeer.java:576) ~[?:?]
      	at org.opendaylight.protocol.bgp.rib.impl.BGPPeer.releaseConnectionGracefully(BGPPeer.java:666) ~[?:?]
      	at org.opendaylight.protocol.bgp.rib.impl.BGPPeer.onSessionDown(BGPPeer.java:531) ~[?:?]
      	at org.opendaylight.protocol.bgp.rib.impl.BGPSessionImpl.endOfInput(BGPSessionImpl.java:335) ~[?:?]
      	at org.opendaylight.protocol.bgp.rib.impl.BGPSessionImpl.channelInactive(BGPSessionImpl.java:548) ~[?:?]
      	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:305) ~[?:?]
      	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281) ~[?:?]
      	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274) ~[?:?]
      	at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:411) ~[?:?]
      	at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:376) ~[?:?]
      	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:305) ~[?:?]
      	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281) ~[?:?]
      	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274) ~[?:?]
      	at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:411) ~[?:?]
      	at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:376) ~[?:?]
      	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:305) ~[?:?]
      	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281) ~[?:?]
      	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274) ~[?:?]
      	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405) ~[?:?]
      	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:301) ~[?:?]
      	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281) ~[?:?]
      	at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901) ~[?:?]
      	at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:813) ~[?:?]
      	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174) ~[?:?]
      	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167) ~[?:?]
      	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) ~[?:?]
      	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:403) ~[?:?]
      	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[?:?]
      	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
      	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[?:?]
      	at java.lang.Thread.run(Thread.java:833) ~[?:?]

      This seems to be an ordering issue in BGPPeer.releaseConnection(), in the not isRestartingGracefully() case:

      • we first terminateConnection(), which in turn runs removePeer() with AbstractPeer.domChain
      • we then releaseRibOutChain(), which in turn waits for transactions on AbstractPeer.ribOutChain

      Given these two are unsynchronized, the domChain transaction can jump in front of whatever is happening on ribOut.

            rovarga Robert Varga
            rovarga Robert Varga
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: