[BGPCEP-988] Regression with BGP Link State Created: 09/Nov/21  Updated: 21/Nov/21  Resolved: 21/Nov/21

Status: Resolved
Project: bgpcep
Component/s: BGP
Affects Version/s: 0.16.8, 0.16.9
Fix Version/s: 0.16.11

Type: Bug Priority: High
Reporter: Olivier Dugeon Assignee: Marek Zaťko
Resolution: Done Votes: 0
Labels: regression
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

BGP speaker is a Juniper MX 960 with JunOS 21.1R1.11.

Karaf is a fresh compilation of BGPCEP master.

Note that the problem does not occur with Phosphorus release.


Attachments: File fresh-install.log     File karaf.log     File restart.log     File solved.log    
Issue Links:
Relates
relates to BGPCEP-958 Remove blueprint from bgp-rib-impl Resolved
relates to BGPCEP-992 Refactor DefaultBgpDeployer Confirmed
Priority: Highest

 Description   

Since release bgpcep-0.16.8, BGP is failing to start properly and establish a BGP session with a peer for Link State family.

 

A fist set of error occurs when BGP Topology Deployer started:

opendaylight-cluster-data-akka.actor.default-dispatcher-5 | ClusterSingletonServiceGroupImpl | 261 - org.opendaylight.mdsal.singleton-dom-impl - 8.0.7 | Service group example-bgp-rib-service-group service org.opendaylight.protocol.bgp.rib.impl.config.BGPClusterSingletonService@3a8d67e1 failed to start, attempting to continue
com.google.common.base.VerifyException: expected a non-null reference

 

Then a problem occurs with data store when adding a peer:

org.opendaylight.mdsal.common.api.OptimisticLockFailedException: Optimistic lock failed for path /(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)bgp-rib/rib/rib[\{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)id=example-bgp-rib}]/loc-rib/tables/tables[\{(urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)afi=(urn:opendaylight:params:xml:ns:yang:bgp-linkstate?revision=2020-01-20)linkstate-address-family, (urn:opendaylight:params:xml:ns:yang:bgp-rib?revision=2018-03-29)safi=(urn:opendaylight:params:xml:ns:yang:bgp-linkstate?revision=2020-01-20)linkstate-subsequent-address-family}]/attributes

This conduct to many errors when attempting to write Link State information into the Datastore.

Full log is attached.



 Comments   
Comment by Marek Zaťko [ 11/Nov/21 ]

Hi Olivier, can you please describe step by step, how are you starting the karaf and which bundles/features are you installing upon startup? Because I was not able to reproduce this issue, I tried felix and equinox. Reason I'm asking this is because in provided logs I see this chunk at startup:

 2021-11-09T11:35:39,861 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-32 | DefaultBgpDeployer | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.10.SNAPSHOT | BGP Deployer global-bgp started.
 2021-11-09T11:35:39,885 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-32 | OSGiStatementRegistry | 370 - org.opendaylight.bgpcep.bgp-openconfig-rp-spi - 0.16.10.SNAPSHOT | Stopping 0 StatementProviderActivator instances
 2021-11-09T11:35:39,915 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-32 | OSGiStatementRegistry | 370 - org.opendaylight.bgpcep.bgp-openconfig-rp-spi - 0.16.10.SNAPSHOT | Starting 1 StatementProviderActivator instances
 2021-11-09T11:35:39,957 | INFO | Blueprint Extender: 2 | BlueprintContainerImpl | 27 - org.apache.aries.blueprint.core - 1.10.3 | Blueprint bundle org.opendaylight.bgpcep.bgp-peer-acceptor/0.16.10.SNAPSHOT is waiting for dependencies [(&(|(type=default)(!(type=*)))(objectClass=org.opendaylight.mdsal.binding.api.DataBroker))]
 2021-11-09T11:35:39,974 | INFO | opendaylight-cluster-data-notification-dispatcher-40 | BGPClusterSingletonService | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.10.SNAPSHOT | BGPClusterSingletonService example-bgp-rib-service-group registered
 2021-11-09T11:35:39,994 | INFO | Start Level: Equinox Container: a81c060d-3c5c-4444-a60e-1b22e133ec93 | DefaultBgpDeployer | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.10.SNAPSHOT | BGP Deployer global-bgp started.
 2021-11-09T11:35:40,002 | INFO | opendaylight-cluster-data-notification-dispatcher-41 | BGPClusterSingletonService | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.10.SNAPSHOT | BGPClusterSingletonService example-bgp-rib-service-group registered
 2021-11-09T11:35:40,035 | INFO | Blueprint Extender: 2 | BlueprintContainerImpl | 27 - org.apache.aries.blueprint.core - 1.10.3 | Blueprint bundle org.opendaylight.netconf.restconf-nb-bierman02/2.0.8 is waiting for dependencies [(objectClass=org.opendaylight.aaa.web.WebContextSecurer)]
 2021-11-09T11:35:40,037 | INFO | Blueprint Extender: 2 | BlueprintContainerImpl | 27 - org.apache.aries.blueprint.core - 1.10.3 | Blueprint bundle org.opendaylight.netconf.restconf-nb-rfc8040/2.0.8 is waiting for dependencies [(objectClass=org.opendaylight.aaa.web.WebContextSecurer)]
 2021-11-09T11:35:40,037 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-32 | OSGiDistributedDataStore | 220 - org.opendaylight.controller.sal-distributed-datastore - 4.0.6 | Distributed Datastore type CONFIGURATION started
 2021-11-09T11:35:40,046 | INFO | opendaylight-cluster-data-notification-dispatcher-41 | LazyBindingMap | 226 - org.opendaylight.mdsal.binding-dom-codec - 8.0.7 | Using lazy population for maps larger than 1 element(s)
 2021-11-09T11:35:40,064 | INFO | Blueprint Extender: 1 | H2TokenStore | 193 - org.opendaylight.aaa.idm-store-h2 - 0.14.6 | Initialized token store with default cache config
 2021-11-09T11:35:40,067 | INFO | Blueprint Extender: 1 | StoreBuilder | 188 - org.opendaylight.aaa.authn-api - 0.14.6 | Checking if default entries must be created in IDM store
 2021-11-09T11:35:40,067 | INFO | opendaylight-cluster-data-notification-dispatcher-40 | StringValueObjectFactory | 236 - org.opendaylight.mdsal.binding-spec-util - 8.0.7 | Instantiated factory for class org.opendaylight.yang.gen.v1.urn.ietf.params.xml.ns.yang.ietf.inet.types.rev130715.Ipv4AddressNoZone
 2021-11-09T11:35:40,070 | INFO | opendaylight-cluster-data-notification-dispatcher-40 | StringValueObjectFactory | 236 - org.opendaylight.mdsal.binding-spec-util - 8.0.7 | Instantiated factory for class org.opendaylight.yang.gen.v1.urn.ietf.params.xml.ns.yang.ietf.inet.types.rev130715.Ipv4Prefix
 2021-11-09T11:35:40,071 | INFO | opendaylight-cluster-data-notification-dispatcher-40 | StringValueObjectFactory | 236 - org.opendaylight.mdsal.binding-spec-util - 8.0.7 | Instantiated factory for class org.opendaylight.yang.gen.v1.urn.ietf.params.xml.ns.yang.ietf.inet.types.rev130715.Ipv6AddressNoZone
 2021-11-09T11:35:40,072 | INFO | opendaylight-cluster-data-notification-dispatcher-40 | StringValueObjectFactory | 236 - org.opendaylight.mdsal.binding-spec-util - 8.0.7 | Instantiated factory for class org.opendaylight.yang.gen.v1.urn.ietf.params.xml.ns.yang.ietf.inet.types.rev130715.Ipv6Prefix
 2021-11-09T11:35:40,139 | INFO | Blueprint Extender: 3 | BlueprintContainerImpl | 27 - org.apache.aries.blueprint.core - 1.10.3 | Blueprint bundle org.opendaylight.bgpcep.bgp-peer-acceptor/0.16.10.SNAPSHOT has been started
 2021-11-09T11:35:40,139 | INFO | Start Level: Equinox Container: a81c060d-3c5c-4444-a60e-1b22e133ec93 | DefaultBgpDeployer | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.10.SNAPSHOT | Closing BGP Deployer.
 2021-11-09T11:35:40,147 | INFO | Start Level: Equinox Container: a81c060d-3c5c-4444-a60e-1b22e133ec93 | BGPClusterSingletonService | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.10.SNAPSHOT | BGPClusterSingletonService example-bgp-rib-service-group close
 2021-11-09T11:35:40,148 | INFO | Blueprint Event Dispatcher: 1 | BlueprintBundleTracker | 204 - org.opendaylight.controller.blueprint - 4.0.6 | Blueprint container for bundle org.opendaylight.bgpcep.bgp-peer-acceptor_0.16.10.SNAPSHOT [378] was successfully created
 2021-11-09T11:35:40,166 | INFO | Start Level: Equinox Container: a81c060d-3c5c-4444-a60e-1b22e133ec93 | DefaultBgpDeployer | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.10.SNAPSHOT | BGP Deployer global-bgp started.
 2021-11-09T11:35:40,170 | INFO | opendaylight-cluster-data-notification-dispatcher-40 | BGPClusterSingletonService | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.10.SNAPSHOT | BGPClusterSingletonService example-bgp-rib-service-group registered
 2021-11-09T11:35:40,172 | INFO | Start Level: Equinox Container: a81c060d-3c5c-4444-a60e-1b22e133ec93 | DefaultBgpDeployer | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.10.SNAPSHOT | Closing BGP Deployer.
 2021-11-09T11:35:40,176 | INFO | Start Level: Equinox Container: a81c060d-3c5c-4444-a60e-1b22e133ec93 | BGPClusterSingletonService | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.10.SNAPSHOT | BGPClusterSingletonService example-bgp-rib-service-group close
 2021-11-09T11:35:40,178 | INFO | Start Level: Equinox Container: a81c060d-3c5c-4444-a60e-1b22e133ec93 | OSGiStatementRegistry | 370 - org.opendaylight.bgpcep.bgp-openconfig-rp-spi - 0.16.10.SNAPSHOT | Stopping 1 StatementProviderActivator instances
 2021-11-09T11:35:40,186 | INFO | Start Level: Equinox Container: a81c060d-3c5c-4444-a60e-1b22e133ec93 | OSGiStatementRegistry | 370 - org.opendaylight.bgpcep.bgp-openconfig-rp-spi - 0.16.10.SNAPSHOT | Starting 2 StatementProviderActivator instances
 2021-11-09T11:35:40,192 | INFO | Start Level: Equinox Container: a81c060d-3c5c-4444-a60e-1b22e133ec93 | DefaultBgpDeployer | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.10.SNAPSHOT | BGP Deployer global-bgp started.
 2021-11-09T11:35:40,197 | INFO | opendaylight-cluster-data-notification-dispatcher-40 | BGPClusterSingletonService | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.10.SNAPSHOT | BGPClusterSingletonService example-bgp-rib-service-group registered
 2021-11-09T11:35:40,209 | INFO | Start Level: Equinox Container: a81c060d-3c5c-4444-a60e-1b22e133ec93 | DefaultBgpDeployer | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.10.SNAPSHOT | Closing BGP Deployer.
 2021-11-09T11:35:40,213 | INFO | Start Level: Equinox Container: a81c060d-3c5c-4444-a60e-1b22e133ec93 | BGPClusterSingletonService | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.10.SNAPSHOT | BGPClusterSingletonService example-bgp-rib-service-group close
 2021-11-09T11:35:40,215 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-5 | BGPClusterSingletonService | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.10.SNAPSHOT | BGPClusterSingletonService example-bgp-rib-service-group instantiated
 2021-11-09T11:35:40,230 | INFO | Start Level: Equinox Container: a81c060d-3c5c-4444-a60e-1b22e133ec93 | DefaultBgpDeployer | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.10.SNAPSHOT | BGP Deployer global-bgp started.
 

which says that BGP deployer is started and closed multiple times which seems rather unusual.

Comment by Robert Varga [ 15/Nov/21 ]

That flapping is okay given the startup flux, as we are eagerly registering instances – we might want to add logging for that.

Comment by Robert Varga [ 15/Nov/21 ]

Okay, this part is interesting:

2021-11-09T11:35:40,406 | INFO  | opendaylight-cluster-data-akka.actor.default-dispatcher-5 | BGPClusterSingletonService       | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.10.SNAPSHOT | BGPClusterSingletonService example-bgp-rib-service-group instantiated
2021-11-09T11:35:40,407 | INFO  | opendaylight-cluster-data-akka.actor.default-dispatcher-5 | BGPClusterSingletonService       | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.10.SNAPSHOT | BGPClusterSingletonService example-bgp-rib-service-group instantiated

This would seem to indicate we have two of these running concurrently, i.e. did we synchronize on deployer shutdown? If not, that would explain the verifyNotNull() tripping over.

Comment by Robert Varga [ 15/Nov/21 ]

After briefly browsing through the code, it seems shutdown paths in rib-impl are (and were for a long time) rather broken w.r.t. singleton lifecycle.
Migration from blueprint to SCR, where components are shut down/restarted as needed is flushing these out. Blueprint was hiding these problems by never issuing a restart.

Comment by Olivier Dugeon [ 15/Nov/21 ]

Here it is the sequences I used to install and configure ODL

  1. untar bgpcep karaf
  2. launch karaf
  3. install restconf feature with feature:install features-restconf and bgp feature with feature:install features-bgp
  4. stop karaf and replace the etc/opendaylight/bgpcep/protocol-configs.xml with mine setting (mostly change the AS number and IP address of ODL and disable BGP peer)
  5. start again karaf
  6. remove old BGP peers (the ones that figure in the original protocol-configs.xml)
  7. install the BGP peer that corresponds to my Juniper router

Regarding the log I upload, it corresponds to a fresh startup after installation and configuration I described above (i.e. stop karaf after step 7, remove karaf.log and start karaf again).

Note, that it was always necessary to restart karaf after installing bgp and/or pcep features to get them properly start.

Comment by Robert Varga [ 15/Nov/21 ]

Ah, thanks for the note, that is a gap I was unaware of. So after this we should be something like third of the way towards not needing that Karaf restart

Comment by Olivier Dugeon [ 18/Nov/21 ]

Patch set solved the problem, but introduce a new regression: it is necessary to restart karaf after adding a new peer in order to get the BGP session up and running.

I proceed as usual (see above comment) to install the new compiled version. The problem take places just right after adding a new peer  through the REST API. Peer seems correctly added, but BGP session is not fire up. It is necessary to restart karaf to get it up and running. I attached 2 new logs files. Thefirst one, fresh-install.log, captured log when default peers coming from the default configuration file are removed and the new one added. The second one, restart.log, captured the log when karaf is restarted. Note that DEBUG for org.opendaylight.bgpcep.bgp has been enabled before removing peers, but nothing pertinent is revealed and there is no crash and no exception. Just nothing happen.

Comment by Marek Zaťko [ 19/Nov/21 ]

I just updated the patch, added few logs to session creation, do you mind trying it again with debug log level before adding your peer and posting results? I tried reproducing your scenario but I couldn't. I tried the following:
1. feature:install features-restconf features-bgp
2. shut down karaf
3. Modify etc/opendaylight/bgpcep/protocol-configs.xml to contain my desired configuration with new neighbor
4. start karaf
Neighbor is connecting without any issues in my scenario. I also tried connecting new neighbor via RESTCONF upon restarting karaf, session is initialized just fine for me.
In your fresh-install logs the relevant piece is this:

2021-11-18T14:29:56,582 | INFO  | opendaylight-cluster-data-notification-dispatcher-53 | BGPClusterSingletonService       | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.11.SNAPSHOT | Creating Peer instance IpAddress{_ipv4Address=Ipv4Address{_value=10.194.77.242}} with configuration: Neighbor{afiSafis=AfiSafis{afiSafi={AfiSafiKey{_afiSafiName=interface org.opendaylight.yang.gen.v1.http.openconfig.net.yang.bgp.types.rev151009.IPV4UNICAST}=AfiSafi{afiSafiName=interface org.opendaylight.yang.gen.v1.http.openconfig.net.yang.bgp.types.rev151009.IPV4UNICAST}, AfiSafiKey{_afiSafiName=interface org.opendaylight.yang.gen.v1.http.openconfig.net.yang.bgp.types.rev151009.IPV6UNICAST}=AfiSafi{afiSafiName=interface org.opendaylight.yang.gen.v1.http.openconfig.net.yang.bgp.types.rev151009.IPV6UNICAST}, AfiSafiKey{_afiSafiName=interface org.opendaylight.yang.gen.v1.urn.opendaylight.params.xml.ns.yang.bgp.openconfig.extensions.rev180329.LINKSTATE}=AfiSafi{afiSafiName=interface org.opendaylight.yang.gen.v1.urn.opendaylight.params.xml.ns.yang.bgp.openconfig.extensions.rev180329.LINKSTATE}}}, config=Config{peerAs=AsNumber{_value=65522}, peerType=INTERNAL, routeFlapDamping=false, sendCommunity=NONE}, neighborAddress=IpAddress{_ipv4Address=Ipv4Address{_value=10.194.77.242}}, routeReflector=RouteReflector{config=Config{routeReflectorClient=true}}, timers=Timers{config=Config{connectRetry=10, holdTime=180, keepaliveInterval=30, minimumAdvertisementInterval=30}}, transport=Transport{config=Config{mtuDiscovery=false, passiveMode=false, augmentation=[NeighborTransportConfig{remotePort=PortNumber{_value=179}}]}}}
2021-11-18T14:29:56,583 | INFO  | opendaylight-cluster-data-notification-dispatcher-53 | BgpPeer                          | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.11.SNAPSHOT | Starting BgPeer instance IpAddress{_ipv4Address=Ipv4Address{_value=10.194.77.242}}
2021-11-18T14:29:56,584 | INFO  | opendaylight-cluster-data-notification-dispatcher-53 | AbstractPeer                     | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.11.SNAPSHOT | Creating DOM peer chain null
2021-11-18T14:29:56,584 | INFO  | opendaylight-cluster-data-notification-dispatcher-53 | BGPClusterSingletonService       | 380 - org.opendaylight.bgpcep.bgp-rib-impl - 0.16.11.SNAPSHOT | Peer instance created IpAddress{_ipv4Address=Ipv4Address{_value=10.194.77.242}}

Which is all there is for connecting this peer, no session initialize, nothing, suggesting that https://github.com/opendaylight/bgpcep/blob/master/bgp/rib-impl/src/main/java/org/opendaylight/protocol/bgp/rib/impl/config/BgpPeer.java#L334 is false, which seems weird because your neighbor transport config has passiveMode=false. These new logs may bring us more clarity on what exactly is the problem. Thanks

Comment by Olivier Dugeon [ 19/Nov/21 ]

Well, this time is working well.

I proceed as usual and when adding peer through Rest API, BGP was added and session established. I upload corresponding logs (solved.log), but didn't see anything wrong. Link State topology is correctly retrieved and corresponding Graph correctly fulfill. Don't know if it is due to the new mvn compilation or the log you added. By the way, I setup BGP log to DEBUG before removing default old peers and adding the new one. I let you look to the log in case you found something wrong or some explanations.

Regards

Generated at Wed Feb 07 19:14:42 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.