[BGPCEP-898] bgp-topology-provider bundle faills to start Created: 09/Mar/20  Updated: 06/Nov/20  Resolved: 13/Mar/20

Status: Resolved
Project: bgpcep
Component/s: General
Affects Version/s: None
Fix Version/s: Magnesium

Type: Bug Priority: Highest
Reporter: Jamo Luhrsen Assignee: Robert Varga
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Priority: Highest

 Description   

A regression in Magnesium CSIT was noticed recently. Examples:

Oddly, however, the 3rd job listed above did have a new run that passed, and not sure how
to explain that yet.

Looking at some karaf logs from the failures, I notice some trouble around:

org.opendaylight.bgpcep.bgp-topology-provider/0.13.0.SNAPSHOT

full exception taken from this karaf.log :

2020-03-09T18:06:19,578 | INFO  | Blueprint Extender: 2 | BgpTopologyDeployerImpl          | 241 - org.opendaylight.bgpcep.bgp-topology-provider - 0.13.0 | BGP topology deployer started.
2020-03-09T18:06:19,635 | WARN  | Blueprint Extender: 2 | BeanRecipe                       | 84 - org.apache.aries.blueprint.core - 1.10.2 | Object to be destroyed is not an instance of UnwrapperedBeanHolder, type: null
2020-03-09T18:06:19,639 | INFO  | Blueprint Extender: 2 | BgpTopologyDeployerImpl          | 241 - org.opendaylight.bgpcep.bgp-topology-provider - 0.13.0 | BGP topology deployer stopped.
2020-03-09T18:06:19,640 | ERROR | Blueprint Extender: 2 | BlueprintContainerImpl           | 84 - org.apache.aries.blueprint.core - 1.10.2 | Unable to start container for blueprint bundle org.opendaylight.bgpcep.bgp-topology-provider/0.13.0
org.osgi.service.blueprint.container.ComponentDefinitionException: Error when instantiating bean linkstateGraphProvider of class org.opendaylight.bgpcep.bgp.topology.provider.config.LinkstateGraphProvider
	at org.apache.aries.blueprint.container.BeanRecipe.wrapAsCompDefEx(BeanRecipe.java:362) ~[84:org.apache.aries.blueprint.core:1.10.2]
	at org.apache.aries.blueprint.container.BeanRecipe.getInstanceFromType(BeanRecipe.java:352) ~[84:org.apache.aries.blueprint.core:1.10.2]
	at org.apache.aries.blueprint.container.BeanRecipe.getInstance(BeanRecipe.java:283) ~[84:org.apache.aries.blueprint.core:1.10.2]
	at org.apache.aries.blueprint.container.BeanRecipe.internalCreate2(BeanRecipe.java:685) ~[84:org.apache.aries.blueprint.core:1.10.2]
	at org.apache.aries.blueprint.container.BeanRecipe.internalCreate(BeanRecipe.java:666) ~[84:org.apache.aries.blueprint.core:1.10.2]
	at org.apache.aries.blueprint.di.AbstractRecipe$1.call(AbstractRecipe.java:81) ~[84:org.apache.aries.blueprint.core:1.10.2]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
	at org.apache.aries.blueprint.di.AbstractRecipe.create(AbstractRecipe.java:90) ~[84:org.apache.aries.blueprint.core:1.10.2]
	at org.apache.aries.blueprint.container.BlueprintRepository.createInstances(BlueprintRepository.java:360) ~[84:org.apache.aries.blueprint.core:1.10.2]
	at org.apache.aries.blueprint.container.BlueprintRepository.createAll(BlueprintRepository.java:190) ~[84:org.apache.aries.blueprint.core:1.10.2]
	at org.apache.aries.blueprint.container.BlueprintContainerImpl.instantiateEagerComponents(BlueprintContainerImpl.java:737) ~[84:org.apache.aries.blueprint.core:1.10.2]
	at org.apache.aries.blueprint.container.BlueprintContainerImpl.doRun(BlueprintContainerImpl.java:433) [84:org.apache.aries.blueprint.core:1.10.2]
	at org.apache.aries.blueprint.container.BlueprintContainerImpl.run(BlueprintContainerImpl.java:298) [84:org.apache.aries.blueprint.core:1.10.2]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at org.apache.aries.blueprint.container.ExecutorServiceWrapper.run(ExecutorServiceWrapper.java:106) [84:org.apache.aries.blueprint.core:1.10.2]
	at org.apache.aries.blueprint.utils.threading.impl.DiscardableRunnable.run(DiscardableRunnable.java:45) [84:org.apache.aries.blueprint.core:1.10.2]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: java.lang.NullPointerException
	at java.util.Objects.requireNonNull(Objects.java:221) ~[?:?]
	at org.opendaylight.bgpcep.bgp.topology.provider.LinkstateGraphBuilder.<init>(LinkstateGraphBuilder.java:123) ~[?:?]
	at org.opendaylight.bgpcep.bgp.topology.provider.config.LinkstateGraphProvider.createTopologyBuilder(LinkstateGraphProvider.java:32) ~[?:?]
	at org.opendaylight.bgpcep.bgp.topology.provider.config.AbstractBgpTopologyProvider.createInstance(AbstractBgpTopologyProvider.java:77) ~[?:?]
	at org.opendaylight.bgpcep.bgp.topology.provider.config.AbstractBgpTopologyProvider.onTopologyBuilderCreated(AbstractBgpTopologyProvider.java:52) ~[?:?]
	at org.opendaylight.bgpcep.bgp.topology.provider.config.BgpTopologyDeployerImpl.lambda$registerTopologyProvider$4(BgpTopologyDeployerImpl.java:113) ~[?:?]
	at java.util.ArrayList.forEach(ArrayList.java:1540) ~[?:?]
	at org.opendaylight.bgpcep.bgp.topology.provider.config.BgpTopologyDeployerImpl.registerTopologyProvider(BgpTopologyDeployerImpl.java:113) ~[?:?]
	at org.opendaylight.bgpcep.bgp.topology.provider.config.AbstractBgpTopologyProvider.<init>(AbstractBgpTopologyProvider.java:44) ~[?:?]
	at org.opendaylight.bgpcep.bgp.topology.provider.config.LinkstateGraphProvider.<init>(LinkstateGraphProvider.java:25) ~[?:?]
	at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:?]
	at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:?]
	at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:?]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:490) ~[?:?]
	at org.apache.aries.blueprint.utils.ReflectionUtils.newInstance(ReflectionUtils.java:369) ~[?:?]
	at org.apache.aries.blueprint.container.BeanRecipe.newInstance(BeanRecipe.java:839) ~[?:?]
	at org.apache.aries.blueprint.container.BeanRecipe.getInstanceFromType(BeanRecipe.java:350) ~[?:?]
	... 21 more

There were two bgpcep patches [ 1 ][ 2 ] that were merged in the relevant time frame. I took
the distribution created from both and was able to locally produce the same bundle starting
issue with [1], but not [2]. I notice that [1] actually came as a cherry-pick from master and
the master gerrit did run 100% passing gate CSIT. I've tried to trigger CSIT from [1] directly
and when it's done, the logs will appear as comments there.

The trace in karaf.log is in the bgpTopology* area which also points to [1]. [2] is a lot of
docs changes with some changes in algo-impl/

udpates to follow as they come
 



 Comments   
Comment by Jamo Luhrsen [ 09/Mar/20 ]

oh, maybe unrelated and a red herring, but an autorelease failure happened in bgpcep around the same time
as this regression. It passed on the next run, but the failure was this:

[ERROR] Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 60.493 s <<< FAILURE! - in org.opendaylight.protocol.bgp.rib.impl.GracefulRestartTest
[ERROR] waitForEORonLocalGracefulRestart(org.opendaylight.protocol.bgp.rib.impl.GracefulRestartTest)  Time elapsed: 32.882 s  <<< FAILURE!
java.lang.AssertionError: expected:<1> but was:<0>

just a guess here, but maybe the failure here in the Restart Test was because some bundle(s) did not start properly?

could this be a race condition that does not happen every time, which could also explain why the gate/patch tests jobs
were 100% passing as well as the newer magnesium job that also passsed.

Comment by Jamo Luhrsen [ 09/Mar/20 ]

I was able to reproduce locally with the following featuresBoot features:

odl-restconf,odl-bgpcep-pcep,odl-bgpcep-bgp,odl-bgpcep-bgp-config-example,odl-bgpcep-bmp,odl-bgpcep-bmp-config-example,odl-jolokia
Comment by Ajay Lele [ 10/Mar/20 ]

odd22 the exception seems to be coming while instantiating the linkstateGraphProvider bean from blueprint due to null reference to connectedGraphProvider. Can you please take a look?

Comment by Olivier Dugeon [ 10/Mar/20 ]

Hi,

This is a known issue reported in the documentation bgpcep/doc/pcep/pcep-user-guide-path-computation.rst:

 

Known Bug

When using BGP-LS for automatic Graph topology acquisition, for an undetermined reason, karaf is unable to start properly the bgp-topology-provider bundle. }}{{This is due to karaf that doesn't properly manage blueprint dependencies. Thus, BGP Topology Provider class is initialized with a wrong reference to the GraphTopologyService: a null pointer is provided instead. However, it is easy to overcome this issue by simply restarting the bgp-topology-provider bundle. First identify the bundle number of bgp-topology-provider and check the status.

    opendaylight-user@karaf>bundle:list | grep bgp-topology-provider

    232 │ Failure  │  80 │ 0.14.0          │ bgp-topology-provider

Then restart the bundle if status is Failure

    opendaylight-user@karaf>bundle:restart 232

And finally, verify that the bundle is active

    opendaylight-user@root>bundle:list 232

    START LEVEL 100 , List Threshold: 50

     ID │ State  │ Lvl │ Version         │ Name

    ────┼────────┼─────┼─────────────────┼───────────────────────

    232 │ Active │  80 │ 0.14.0          │ bgp-topology-provider

Looking to the log, you will normally see that a new Graph has been created and fulfil with your network topology element. Using Graph Rest API Get Operational Graph will also validate that all is running correctly.

 

The main problem does not come from the code itself, but rather than from karaf which is unable to correctly schedule the blueprint.

bgp-topology-provider is launched with a null pointer as reference to the GraphServiceProvider while GraphServiceProvider is available or available latter.

It is why a simple bundle:restart is sufficient to overcome the issue.

I will change LinkstateGraphBuilder() constructor to avoid asserting a failure when karaf provide a null pointer for the GraphServiceProvider and fulfil the Graph only if the Graph has been correctly provided. Of course, LinkStateGraphBuilder() will not start properly, but it will not block bgp-topology-provider bundle, and again a bundle:restart will solve the issue.

Regards

Olivier
 

Comment by Robert Varga [ 12/Mar/20 ]

Alright, this one is rather obvious, once we take evidence into account. https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/bgpcep-csit-1node-gate-bgp-ingest-mixed-all-aluminium/6/odl_1/odl1_karaf.log.gz clearly shows:

org.osgi.service.blueprint.container.ComponentDefinitionException: Error when instantiating bean linkstateGraphProvider of class org.opendaylight.bgpcep.bgp.topology.provider.config.LinkstateGraphProvider
	at org.apache.aries.blueprint.container.BeanRecipe.wrapAsCompDefEx(BeanRecipe.java:362) ~[84:org.apache.aries.blueprint.core:1.10.2]
	at org.apache.aries.blueprint.container.BeanRecipe.getInstanceFromType(BeanRecipe.java:352) ~[84:org.apache.aries.blueprint.core:1.10.2]
	at org.apache.aries.blueprint.container.BeanRecipe.getInstance(BeanRecipe.java:283) ~[84:org.apache.aries.blueprint.core:1.10.2]
	at org.apache.aries.blueprint.container.BeanRecipe.internalCreate2(BeanRecipe.java:685) ~[84:org.apache.aries.blueprint.core:1.10.2]
	at org.apache.aries.blueprint.container.BeanRecipe.internalCreate(BeanRecipe.java:666) ~[84:org.apache.aries.blueprint.core:1.10.2]
	at org.apache.aries.blueprint.di.AbstractRecipe$1.call(AbstractRecipe.java:81) ~[84:org.apache.aries.blueprint.core:1.10.2]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
	at org.apache.aries.blueprint.di.AbstractRecipe.create(AbstractRecipe.java:90) ~[84:org.apache.aries.blueprint.core:1.10.2]
	at org.apache.aries.blueprint.container.BlueprintRepository.createInstances(BlueprintRepository.java:360) ~[84:org.apache.aries.blueprint.core:1.10.2]
	at org.apache.aries.blueprint.container.BlueprintRepository.createAll(BlueprintRepository.java:190) ~[84:org.apache.aries.blueprint.core:1.10.2]
	at org.apache.aries.blueprint.container.BlueprintContainerImpl.instantiateEagerComponents(BlueprintContainerImpl.java:737) ~[84:org.apache.aries.blueprint.core:1.10.2]
	at org.apache.aries.blueprint.container.BlueprintContainerImpl.doRun(BlueprintContainerImpl.java:433) [84:org.apache.aries.blueprint.core:1.10.2]
	at org.apache.aries.blueprint.container.BlueprintContainerImpl.run(BlueprintContainerImpl.java:298) [84:org.apache.aries.blueprint.core:1.10.2]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at org.apache.aries.blueprint.container.ExecutorServiceWrapper.run(ExecutorServiceWrapper.java:106) [84:org.apache.aries.blueprint.core:1.10.2]
	at org.apache.aries.blueprint.utils.threading.impl.DiscardableRunnable.run(DiscardableRunnable.java:45) [84:org.apache.aries.blueprint.core:1.10.2]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: java.lang.NullPointerException
	at java.util.Objects.requireNonNull(Objects.java:221) ~[?:?]
	at org.opendaylight.bgpcep.bgp.topology.provider.LinkstateGraphBuilder.<init>(LinkstateGraphBuilder.java:123) ~[?:?]
	at org.opendaylight.bgpcep.bgp.topology.provider.config.LinkstateGraphProvider.createTopologyBuilder(LinkstateGraphProvider.java:34) ~[?:?]
	at org.opendaylight.bgpcep.bgp.topology.provider.config.AbstractBgpTopologyProvider.createInstance(AbstractBgpTopologyProvider.java:77) ~[?:?]
	at org.opendaylight.bgpcep.bgp.topology.provider.config.AbstractBgpTopologyProvider.onTopologyBuilderCreated(AbstractBgpTopologyProvider.java:52) ~[?:?]
	at org.opendaylight.bgpcep.bgp.topology.provider.config.BgpTopologyDeployerImpl.lambda$registerTopologyProvider$4(BgpTopologyDeployerImpl.java:113) ~[?:?]
	at java.util.ArrayList.forEach(ArrayList.java:1540) ~[?:?]
	at org.opendaylight.bgpcep.bgp.topology.provider.config.BgpTopologyDeployerImpl.registerTopologyProvider(BgpTopologyDeployerImpl.java:113) ~[?:?]
	at org.opendaylight.bgpcep.bgp.topology.provider.config.AbstractBgpTopologyProvider.<init>(AbstractBgpTopologyProvider.java:44) ~[?:?]
	at org.opendaylight.bgpcep.bgp.topology.provider.config.LinkstateGraphProvider.<init>(LinkstateGraphProvider.java:27) ~[?:?]
	at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:?]
	at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:?]
	at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:?]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:490) ~[?:?]
	at org.apache.aries.blueprint.utils.ReflectionUtils.newInstance(ReflectionUtils.java:369) ~[?:?]
	at org.apache.aries.blueprint.container.BeanRecipe.newInstance(BeanRecipe.java:839) ~[?:?]
	at org.apache.aries.blueprint.container.BeanRecipe.getInstanceFromType(BeanRecipe.java:350) ~[?:?]
	... 21 more

So the problem is that AbstractBgpTopologyProvider is performing registration while LinkstateGraphProvider is being initialized. The field is not set yet, hence accessing it through LinkstateGraphProvider.createTopologyBuilder() finds it null.

Generated at Wed Feb 07 19:14:26 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.