[BGPCEP-988] Regression with BGP Link State Created: 09/Nov/21 Updated: 21/Nov/21 Resolved: 21/Nov/21 |
|
| Status: | Resolved |
| Project: | bgpcep |
| Component/s: | BGP |
| Affects Version/s: | 0.16.8, 0.16.9 |
| Fix Version/s: | 0.16.11 |
| Type: | Bug | Priority: | High |
| Reporter: | Olivier Dugeon | Assignee: | Marek Zaťko |
| Resolution: | Done | Votes: | 0 |
| Labels: | regression | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
BGP speaker is a Juniper MX 960 with JunOS 21.1R1.11. Karaf is a fresh compilation of BGPCEP master. Note that the problem does not occur with Phosphorus release. |
||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Priority: | Highest | ||||||||||||
| Description |
|
Since release bgpcep-0.16.8, BGP is failing to start properly and establish a BGP session with a peer for Link State family.
A fist set of error occurs when BGP Topology Deployer started: opendaylight-cluster-data-akka.actor.default-dispatcher-5 | ClusterSingletonServiceGroupImpl | 261 - org.opendaylight.mdsal.singleton-dom-impl - 8.0.7 | Service group example-bgp-rib-service-group service org.opendaylight.protocol.bgp.rib.impl.config.BGPClusterSingletonService@3a8d67e1 failed to start, attempting to continue
Then a problem occurs with data store when adding a peer: org.opendaylight.mdsal.common.api.OptimisticLockFailedException: Optimistic lock failed for path /(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)bgp-rib/rib/rib[\{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)id=example-bgp-rib}]/loc-rib/tables/tables[\{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)afi=(urn:opendaylight:params:xml:ns:yang:bgp-linkstate?revision=2020-01-20)linkstate-address-family, (urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)safi=(urn:opendaylight:params:xml:ns:yang:bgp-linkstate?revision=2020-01-20)linkstate-subsequent-address-family}]/attributes This conduct to many errors when attempting to write Link State information into the Datastore. Full log is attached. |
| Comments |
| Comment by Marek Zaťko [ 11/Nov/21 ] |
|
Hi Olivier, can you please describe step by step, how are you starting the karaf and which bundles/features are you installing upon startup? Because I was not able to reproduce this issue, I tried felix and equinox. Reason I'm asking this is because in provided logs I see this chunk at startup: 2021-11-09T11:35:39,861 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-32 | DefaultBgpDeployer | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.10.SNAPSHOT | BGP Deployer global-bgp started. 2021-11-09T11:35:39,885 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-32 | OSGiStatementRegistry | 370 - org.opendaylight.bgpcep.bgp-openconfig-rp-spi - 0.16.10.SNAPSHOT | Stopping 0 StatementProviderActivator instances 2021-11-09T11:35:39,915 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-32 | OSGiStatementRegistry | 370 - org.opendaylight.bgpcep.bgp-openconfig-rp-spi - 0.16.10.SNAPSHOT | Starting 1 StatementProviderActivator instances 2021-11-09T11:35:39,957 | INFO | Blueprint Extender: 2 | BlueprintContainerImpl | 27 - org.apache.aries.blueprint.core - 1.10.3 | Blueprint bundle org.opendaylight.bgpcep.bgp-peer-acceptor/0.16.10.SNAPSHOT is waiting for dependencies [(&(|(type=default)(!(type=*)))(objectClass=org.opendaylight.mdsal.binding.api.DataBroker))] 2021-11-09T11:35:39,974 | INFO | opendaylight-cluster-data-notification-dispatcher-40 | BGPClusterSingletonService | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.10.SNAPSHOT | BGPClusterSingletonService example-bgp-rib-service-group registered 2021-11-09T11:35:39,994 | INFO | Start Level: Equinox Container: a81c060d-3c5c-4444-a60e-1b22e133ec93 | DefaultBgpDeployer | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.10.SNAPSHOT | BGP Deployer global-bgp started. 2021-11-09T11:35:40,002 | INFO | opendaylight-cluster-data-notification-dispatcher-41 | BGPClusterSingletonService | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.10.SNAPSHOT | BGPClusterSingletonService example-bgp-rib-service-group registered 2021-11-09T11:35:40,035 | INFO | Blueprint Extender: 2 | BlueprintContainerImpl | 27 - org.apache.aries.blueprint.core - 1.10.3 | Blueprint bundle org.opendaylight.netconf.restconf-nb-bierman02/2.0.8 is waiting for dependencies [(objectClass=org.opendaylight.aaa.web.WebContextSecurer)] 2021-11-09T11:35:40,037 | INFO | Blueprint Extender: 2 | BlueprintContainerImpl | 27 - org.apache.aries.blueprint.core - 1.10.3 | Blueprint bundle org.opendaylight.netconf.restconf-nb-rfc8040/2.0.8 is waiting for dependencies [(objectClass=org.opendaylight.aaa.web.WebContextSecurer)] 2021-11-09T11:35:40,037 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-32 | OSGiDistributedDataStore | 220 - org.opendaylight.controller.sal-distributed-datastore - 4.0.6 | Distributed Datastore type CONFIGURATION started 2021-11-09T11:35:40,046 | INFO | opendaylight-cluster-data-notification-dispatcher-41 | LazyBindingMap | 226 - org.opendaylight.mdsal.binding-dom-codec - 8.0.7 | Using lazy population for maps larger than 1 element(s) 2021-11-09T11:35:40,064 | INFO | Blueprint Extender: 1 | H2TokenStore | 193 - org.opendaylight.aaa.idm-store-h2 - 0.14.6 | Initialized token store with default cache config 2021-11-09T11:35:40,067 | INFO | Blueprint Extender: 1 | StoreBuilder | 188 - org.opendaylight.aaa.authn-api - 0.14.6 | Checking if default entries must be created in IDM store 2021-11-09T11:35:40,067 | INFO | opendaylight-cluster-data-notification-dispatcher-40 | StringValueObjectFactory | 236 - org.opendaylight.mdsal.binding-spec-util - 8.0.7 | Instantiated factory for class org.opendaylight.yang.gen.v1.urn.ietf.params.xml.ns.yang.ietf.inet.types.rev130715.Ipv4AddressNoZone 2021-11-09T11:35:40,070 | INFO | opendaylight-cluster-data-notification-dispatcher-40 | StringValueObjectFactory | 236 - org.opendaylight.mdsal.binding-spec-util - 8.0.7 | Instantiated factory for class org.opendaylight.yang.gen.v1.urn.ietf.params.xml.ns.yang.ietf.inet.types.rev130715.Ipv4Prefix 2021-11-09T11:35:40,071 | INFO | opendaylight-cluster-data-notification-dispatcher-40 | StringValueObjectFactory | 236 - org.opendaylight.mdsal.binding-spec-util - 8.0.7 | Instantiated factory for class org.opendaylight.yang.gen.v1.urn.ietf.params.xml.ns.yang.ietf.inet.types.rev130715.Ipv6AddressNoZone 2021-11-09T11:35:40,072 | INFO | opendaylight-cluster-data-notification-dispatcher-40 | StringValueObjectFactory | 236 - org.opendaylight.mdsal.binding-spec-util - 8.0.7 | Instantiated factory for class org.opendaylight.yang.gen.v1.urn.ietf.params.xml.ns.yang.ietf.inet.types.rev130715.Ipv6Prefix 2021-11-09T11:35:40,139 | INFO | Blueprint Extender: 3 | BlueprintContainerImpl | 27 - org.apache.aries.blueprint.core - 1.10.3 | Blueprint bundle org.opendaylight.bgpcep.bgp-peer-acceptor/0.16.10.SNAPSHOT has been started 2021-11-09T11:35:40,139 | INFO | Start Level: Equinox Container: a81c060d-3c5c-4444-a60e-1b22e133ec93 | DefaultBgpDeployer | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.10.SNAPSHOT | Closing BGP Deployer. 2021-11-09T11:35:40,147 | INFO | Start Level: Equinox Container: a81c060d-3c5c-4444-a60e-1b22e133ec93 | BGPClusterSingletonService | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.10.SNAPSHOT | BGPClusterSingletonService example-bgp-rib-service-group close 2021-11-09T11:35:40,148 | INFO | Blueprint Event Dispatcher: 1 | BlueprintBundleTracker | 204 - org.opendaylight.controller.blueprint - 4.0.6 | Blueprint container for bundle org.opendaylight.bgpcep.bgp-peer-acceptor_0.16.10.SNAPSHOT [378] was successfully created 2021-11-09T11:35:40,166 | INFO | Start Level: Equinox Container: a81c060d-3c5c-4444-a60e-1b22e133ec93 | DefaultBgpDeployer | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.10.SNAPSHOT | BGP Deployer global-bgp started. 2021-11-09T11:35:40,170 | INFO | opendaylight-cluster-data-notification-dispatcher-40 | BGPClusterSingletonService | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.10.SNAPSHOT | BGPClusterSingletonService example-bgp-rib-service-group registered 2021-11-09T11:35:40,172 | INFO | Start Level: Equinox Container: a81c060d-3c5c-4444-a60e-1b22e133ec93 | DefaultBgpDeployer | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.10.SNAPSHOT | Closing BGP Deployer. 2021-11-09T11:35:40,176 | INFO | Start Level: Equinox Container: a81c060d-3c5c-4444-a60e-1b22e133ec93 | BGPClusterSingletonService | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.10.SNAPSHOT | BGPClusterSingletonService example-bgp-rib-service-group close 2021-11-09T11:35:40,178 | INFO | Start Level: Equinox Container: a81c060d-3c5c-4444-a60e-1b22e133ec93 | OSGiStatementRegistry | 370 - org.opendaylight.bgpcep.bgp-openconfig-rp-spi - 0.16.10.SNAPSHOT | Stopping 1 StatementProviderActivator instances 2021-11-09T11:35:40,186 | INFO | Start Level: Equinox Container: a81c060d-3c5c-4444-a60e-1b22e133ec93 | OSGiStatementRegistry | 370 - org.opendaylight.bgpcep.bgp-openconfig-rp-spi - 0.16.10.SNAPSHOT | Starting 2 StatementProviderActivator instances 2021-11-09T11:35:40,192 | INFO | Start Level: Equinox Container: a81c060d-3c5c-4444-a60e-1b22e133ec93 | DefaultBgpDeployer | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.10.SNAPSHOT | BGP Deployer global-bgp started. 2021-11-09T11:35:40,197 | INFO | opendaylight-cluster-data-notification-dispatcher-40 | BGPClusterSingletonService | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.10.SNAPSHOT | BGPClusterSingletonService example-bgp-rib-service-group registered 2021-11-09T11:35:40,209 | INFO | Start Level: Equinox Container: a81c060d-3c5c-4444-a60e-1b22e133ec93 | DefaultBgpDeployer | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.10.SNAPSHOT | Closing BGP Deployer. 2021-11-09T11:35:40,213 | INFO | Start Level: Equinox Container: a81c060d-3c5c-4444-a60e-1b22e133ec93 | BGPClusterSingletonService | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.10.SNAPSHOT | BGPClusterSingletonService example-bgp-rib-service-group close 2021-11-09T11:35:40,215 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-5 | BGPClusterSingletonService | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.10.SNAPSHOT | BGPClusterSingletonService example-bgp-rib-service-group instantiated 2021-11-09T11:35:40,230 | INFO | Start Level: Equinox Container: a81c060d-3c5c-4444-a60e-1b22e133ec93 | DefaultBgpDeployer | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.10.SNAPSHOT | BGP Deployer global-bgp started. which says that BGP deployer is started and closed multiple times which seems rather unusual. |
| Comment by Robert Varga [ 15/Nov/21 ] |
|
That flapping is okay given the startup flux, as we are eagerly registering instances – we might want to add logging for that. |
| Comment by Robert Varga [ 15/Nov/21 ] |
|
Okay, this part is interesting: 2021-11-09T11:35:40,406 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-5 | BGPClusterSingletonService | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.10.SNAPSHOT | BGPClusterSingletonService example-bgp-rib-service-group instantiated 2021-11-09T11:35:40,407 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-5 | BGPClusterSingletonService | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.10.SNAPSHOT | BGPClusterSingletonService example-bgp-rib-service-group instantiated This would seem to indicate we have two of these running concurrently, i.e. did we synchronize on deployer shutdown? If not, that would explain the verifyNotNull() tripping over. |
| Comment by Robert Varga [ 15/Nov/21 ] |
|
After briefly browsing through the code, it seems shutdown paths in rib-impl are (and were for a long time) rather broken w.r.t. singleton lifecycle. |
| Comment by Olivier Dugeon [ 15/Nov/21 ] |
|
Here it is the sequences I used to install and configure ODL
Regarding the log I upload, it corresponds to a fresh startup after installation and configuration I described above (i.e. stop karaf after step 7, remove karaf.log and start karaf again). Note, that it was always necessary to restart karaf after installing bgp and/or pcep features to get them properly start. |
| Comment by Robert Varga [ 15/Nov/21 ] |
|
Ah, thanks for the note, that is a gap I was unaware of. So after this we should be something like third of the way towards not needing that Karaf restart |
| Comment by Olivier Dugeon [ 18/Nov/21 ] |
|
Patch set solved the problem, but introduce a new regression: it is necessary to restart karaf after adding a new peer in order to get the BGP session up and running. I proceed as usual (see above comment) to install the new compiled version. The problem take places just right after adding a new peer through the REST API. Peer seems correctly added, but BGP session is not fire up. It is necessary to restart karaf to get it up and running. I attached 2 new logs files. Thefirst one, fresh-install.log, captured log when default peers coming from the default configuration file are removed and the new one added. The second one, restart.log, captured the log when karaf is restarted. Note that DEBUG for org.opendaylight.bgpcep.bgp has been enabled before removing peers, but nothing pertinent is revealed and there is no crash and no exception. Just nothing happen. |
| Comment by Marek Zaťko [ 19/Nov/21 ] |
|
I just updated the patch, added few logs to session creation, do you mind trying it again with debug log level before adding your peer and posting results? I tried reproducing your scenario but I couldn't. I tried the following: 2021-11-18T14:29:56,582 | INFO | opendaylight-cluster-data-notification-dispatcher-53 | BGPClusterSingletonService | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.11.SNAPSHOT | Creating Peer instance IpAddress{_ipv4Address=Ipv4Address{_value=10.194.77.242}} with configuration: Neighbor{afiSafis=AfiSafis{afiSafi={AfiSafiKey{_afiSafiName=interface org.opendaylight.yang.gen.v1.http.openconfig.net.yang.bgp.types.rev151009.IPV4UNICAST}=AfiSafi{afiSafiName=interface org.opendaylight.yang.gen.v1.http.openconfig.net.yang.bgp.types.rev151009.IPV4UNICAST}, AfiSafiKey{_afiSafiName=interface org.opendaylight.yang.gen.v1.http.openconfig.net.yang.bgp.types.rev151009.IPV6UNICAST}=AfiSafi{afiSafiName=interface org.opendaylight.yang.gen.v1.http.openconfig.net.yang.bgp.types.rev151009.IPV6UNICAST}, AfiSafiKey{_afiSafiName=interface org.opendaylight.yang.gen.v1.urn.opendaylight.params.xml.ns.yang.bgp.openconfig.extensions.rev180329.LINKSTATE}=AfiSafi{afiSafiName=interface org.opendaylight.yang.gen.v1.urn.opendaylight.params.xml.ns.yang.bgp.openconfig.extensions.rev180329.LINKSTATE}}}, config=Config{peerAs=AsNumber{_value=65522}, peerType=INTERNAL, routeFlapDamping=false, sendCommunity=NONE}, neighborAddress=IpAddress{_ipv4Address=Ipv4Address{_value=10.194.77.242}}, routeReflector=RouteReflector{config=Config{routeReflectorClient=true}}, timers=Timers{config=Config{connectRetry=10, holdTime=180, keepaliveInterval=30, minimumAdvertisementInterval=30}}, transport=Transport{config=Config{mtuDiscovery=false, passiveMode=false, augmentation=[NeighborTransportConfig{remotePort=PortNumber{_value=179}}]}}}
2021-11-18T14:29:56,583 | INFO | opendaylight-cluster-data-notification-dispatcher-53 | BgpPeer | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.11.SNAPSHOT | Starting BgPeer instance IpAddress{_ipv4Address=Ipv4Address{_value=10.194.77.242}}
2021-11-18T14:29:56,584 | INFO | opendaylight-cluster-data-notification-dispatcher-53 | AbstractPeer | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.11.SNAPSHOT | Creating DOM peer chain null
2021-11-18T14:29:56,584 | INFO | opendaylight-cluster-data-notification-dispatcher-53 | BGPClusterSingletonService | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.11.SNAPSHOT | Peer instance created IpAddress{_ipv4Address=Ipv4Address{_value=10.194.77.242}}
Which is all there is for connecting this peer, no session initialize, nothing, suggesting that https://github.com/opendaylight/bgpcep/blob/master/bgp/rib-impl/src/main/java/org/opendaylight/protocol/bgp/rib/impl/config/BgpPeer.java#L334 is false, which seems weird because your neighbor transport config has passiveMode=false. These new logs may bring us more clarity on what exactly is the problem. Thanks |
| Comment by Olivier Dugeon [ 19/Nov/21 ] |
|
Well, this time is working well. I proceed as usual and when adding peer through Rest API, BGP was added and session established. I upload corresponding logs (solved.log), but didn't see anything wrong. Link State topology is correctly retrieved and corresponding Graph correctly fulfill. Don't know if it is due to the new mvn compilation or the log you added. By the way, I setup BGP log to DEBUG before removing default old peers and adding the new one. I let you look to the log in case you found something wrong or some explanations. Regards |