We have the following splat in CSIT:
2023-04-11T01:32:00,588 | INFO | epollEventLoopGroup-7-6 | BGPPeer | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.19.4 | Session with peer 127.0.0.3 went down 2023-04-11T01:32:00,589 | INFO | epollEventLoopGroup-7-6 | BGPPeer | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.19.4 | Closing session with peer 2023-04-11T01:32:00,590 | INFO | epollEventLoopGroup-7-6 | AbstractPeer | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.19.4 | Closed per Peer /(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)bgp-rib/rib/rib[{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)id=example-bgp-rib}]/peer/peer[{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)peer-id=bgp://127.0.0.3}] removed 2023-04-11T01:32:00,591 | INFO | epollEventLoopGroup-7-3 | AbstractPeer | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.19.4 | Closing peer chain Uri{value=bgp://127.0.0.2} 2023-04-11T01:32:00,591 | INFO | epollEventLoopGroup-7-3 | BGPSessionImpl | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.19.4 | Closing session: BGPSessionImpl{channel=[id: 0x75f3dae8, L:/10.30.170.109:1790 ! R:/127.0.0.2:39923], state=UP} 2023-04-11T01:32:00,592 | WARN | opendaylight-cluster-data-shard-dispatcher-27 | ShardDataTree | 284 - org.opendaylight.controller.sal-distributed-datastore - 7.0.4 | member-1-shard-default-operational: Store Tx member-1-datastore-operational-fe-0-chn-249-txn-2-0: Conflicting modification for path /(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)bgp-rib/rib/rib[{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)id=example-bgp-rib}]/peer/peer[{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)peer-id=bgp://127.0.0.3}]. 2023-04-11T01:32:00,601 | INFO | epollEventLoopGroup-7-2 | BGPSessionImpl | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.19.4 | End of input detected. Close the session. 2023-04-11T01:32:00,601 | INFO | epollEventLoopGroup-7-2 | BGPPeer | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.19.4 | Session with peer 127.0.0.4 went down 2023-04-11T01:32:00,601 | INFO | epollEventLoopGroup-7-2 | BGPPeer | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.19.4 | Closing session with peer 2023-04-11T01:32:00,602 | INFO | epollEventLoopGroup-7-2 | AbstractPeer | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.19.4 | Closed per Peer /(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)bgp-rib/rib/rib[{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)id=example-bgp-rib}]/peer/peer[{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)peer-id=bgp://127.0.0.4}] removed 2023-04-11T01:32:00,603 | INFO | epollEventLoopGroup-7-2 | AbstractPeer | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.19.4 | Closing peer chain Uri{value=bgp://127.0.0.4} 2023-04-11T01:32:00,603 | INFO | epollEventLoopGroup-7-2 | BGPSessionImpl | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.19.4 | Closing session: BGPSessionImpl{channel=[id: 0x144dc834, L:/10.30.170.109:1790 ! R:/127.0.0.4:33429], state=UP} 2023-04-11T01:32:00,602 | ERROR | epollEventLoopGroup-7-6 | AbstractPeer | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.19.4 | Write routes failed java.util.concurrent.ExecutionException: OptimisticLockFailedException{message=Optimistic lock failed for path /(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)bgp-rib/rib/rib[{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)id=example-bgp-rib}]/peer/peer[{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)peer-id=bgp://127.0.0.3}], errorList=[RpcError [message=Optimistic lock failed for path /(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)bgp-rib/rib/rib[{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)id=example-bgp-rib}]/peer/peer[{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)peer-id=bgp://127.0.0.3}], severity=ERROR, errorType=APPLICATION, tag=resource-denied, applicationTag=null, info=null, cause=org.opendaylight.yangtools.yang.data.tree.api.ConflictingModificationAppliedException: Node was deleted by other transaction.]]} at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:588) ~[?:?] at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:567) ~[?:?] at com.google.common.util.concurrent.FluentFuture$TrustedFuture.get(FluentFuture.java:91) ~[?:?] at org.opendaylight.protocol.bgp.rib.impl.AbstractPeer.releaseRibOutChain(AbstractPeer.java:490) ~[?:?] at org.opendaylight.protocol.bgp.rib.impl.BGPPeer.releaseConnection(BGPPeer.java:576) ~[?:?] at org.opendaylight.protocol.bgp.rib.impl.BGPPeer.releaseConnectionGracefully(BGPPeer.java:666) ~[?:?] at org.opendaylight.protocol.bgp.rib.impl.BGPPeer.onSessionDown(BGPPeer.java:531) ~[?:?] at org.opendaylight.protocol.bgp.rib.impl.BGPSessionImpl.endOfInput(BGPSessionImpl.java:335) ~[?:?] at org.opendaylight.protocol.bgp.rib.impl.BGPSessionImpl.channelInactive(BGPSessionImpl.java:548) ~[?:?] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:305) ~[?:?] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281) ~[?:?] at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274) ~[?:?] at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:411) ~[?:?] at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:376) ~[?:?] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:305) ~[?:?] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281) ~[?:?] at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274) ~[?:?] at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:411) ~[?:?] at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:376) ~[?:?] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:305) ~[?:?] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281) ~[?:?] at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274) ~[?:?] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405) ~[?:?] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:301) ~[?:?] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281) ~[?:?] at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901) ~[?:?] at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:813) ~[?:?] at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174) ~[?:?] at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167) ~[?:?] at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) ~[?:?] at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:403) ~[?:?] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[?:?] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[?:?] at java.lang.Thread.run(Thread.java:833) ~[?:?]
This seems to be an ordering issue in BGPPeer.releaseConnection(), in the not isRestartingGracefully() case:
- we first terminateConnection(), which in turn runs removePeer() with AbstractPeer.domChain
- we then releaseRibOutChain(), which in turn waits for transactions on AbstractPeer.ribOutChain
Given these two are unsynchronized, the domChain transaction can jump in front of whatever is happening on ribOut.