[BGPCEP-680] NPE during singleton startup Created: 04/Aug/17  Updated: 03/Mar/19  Resolved: 11/Oct/17

Status: Resolved
Project: bgpcep
Component/s: BGP
Affects Version/s: Bugzilla Migration
Fix Version/s: Bugzilla Migration

Type: Bug
Reporter: Robert Varga Assignee: Yrineu Felipe Rodrigues
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Text File karaf.log    
External issue ID: 8929

 Description   

2017-08-04 17:06:30,736 | ERROR | ult-dispatcher-3 | ClusterSingletonServiceGroupImpl | 183 - org.opendaylight.mdsal.singleton-dom-impl - 2.3.0.Nitrogen | Unexpected exception state for service Provider example-bgp-rib-service-group in TakeLeadership
java.lang.NullPointerException
at org.opendaylight.protocol.bgp.rib.impl.RIBImpl.startLocRib(RIBImpl.java:181)[225:org.opendaylight.bgpcep.bgp-rib-impl:0.8.0.Nitrogen]
at java.lang.Iterable.forEach(Iterable.java:75)[:1.8.0_141]
at org.opendaylight.protocol.bgp.rib.impl.RIBImpl.instantiateServiceInstance(RIBImpl.java:353)[225:org.opendaylight.bgpcep.bgp-rib-impl:0.8.0.Nitrogen]
at org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceRegistrationDelegator.instantiateServiceInstance(ClusterSingletonServiceRegistrationDelegator.java:47)[183:org.opendaylight.mdsal.singleton-dom-impl:2.3.0.Nitrogen]
at org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.takeOwnership(ClusterSingletonServiceGroupImpl.java:305)[183:org.opendaylight.mdsal.singleton-dom-impl:2.3.0.Nitrogen]
at org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.ownershipChanged(ClusterSingletonServiceGroupImpl.java:249)[183:org.opendaylight.mdsal.singleton-dom-impl:2.3.0.Nitrogen]
at org.opendaylight.mdsal.singleton.dom.impl.AbstractClusterSingletonServiceProviderImpl.ownershipChanged(AbstractClusterSingletonServiceProviderImpl.java:146)[183:org.opendaylight.mdsal.singleton-dom-impl:2.3.0.Nitrogen]
at org.opendaylight.mdsal.singleton.dom.impl.DOMClusterSingletonServiceProviderImpl.ownershipChanged(DOMClusterSingletonServiceProviderImpl.java:23)[183:org.opendaylight.mdsal.singleton-dom-impl:2.3.0.Nitrogen]
at org.opendaylight.controller.cluster.datastore.entityownership.EntityOwnershipListenerActor.onEntityOwnershipChanged(EntityOwnershipListenerActor.java:44)[133:org.opendaylight.controller.sal-distributed-datastore:1.6.0.Nitrogen]
at org.opendaylight.controller.cluster.datastore.entityownership.EntityOwnershipListenerActor.handleReceive(EntityOwnershipListenerActor.java:33)[133:org.opendaylight.controller.sal-distributed-datastore:1.6.0.Nitrogen]
at org.opendaylight.controller.cluster.common.actor.AbstractUntypedActor.onReceive(AbstractUntypedActor.java:28)[126:org.opendaylight.controller.sal-clustering-commons:1.6.0.Nitrogen]
at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)[58:com.typesafe.akka.actor:2.4.18]
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)[58:com.typesafe.akka.actor:2.4.18]
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)[58:com.typesafe.akka.actor:2.4.18]
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)[58:com.typesafe.akka.actor:2.4.18]
at akka.actor.ActorCell.invoke(ActorCell.scala:495)[58:com.typesafe.akka.actor:2.4.18]
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)[58:com.typesafe.akka.actor:2.4.18]
at akka.dispatch.Mailbox.run(Mailbox.scala:224)[58:com.typesafe.akka.actor:2.4.18]
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)[58:com.typesafe.akka.actor:2.4.18]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)[204:org.scala-lang.scala-library:2.11.11.v20170413-090219-8a413ba7cc]
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)[204:org.scala-lang.scala-library:2.11.11.v20170413-090219-8a413ba7cc]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)[204:org.scala-lang.scala-library:2.11.11.v20170413-090219-8a413ba7cc]
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)[204:org.scala-lang.scala-library:2.11.11.v20170413-090219-8a413ba7cc]

Reproduced using:

feature:install odl-bgpcep-bgp-rib-api
(wait to complete)
feature:install odl-bgpcep-bgp-rib-impl



 Comments   
Comment by Robert Varga [ 04/Aug/17 ]

Attachment karaf.log has been added with description: karaf.log

Comment by Robert Varga [ 04/Aug/17 ]

This is something in:

final ChoiceNode routes = this.ribContextRegistry.getRIBSupportContext(key).getRibSupport().emptyRoutes();

Comment by Yrineu Felipe Rodrigues [ 06/Sep/17 ]

Hi Robert Varga,

I believe that the root cause of that issue is concurrency. I did checked the parameters and there's no 'Null' parameters. To solve that problem maybe should be interesting to evaluate a little refactoring in order to remove some dependencies from the constructor and avoid some static methods. I'm trying to provide it, but if do you have some other tip, please let me know.

regards,

Comment by Yrineu Felipe Rodrigues [ 06/Sep/17 ]

Hi Robert Varga,

This problem is happening because the 'RIBSupportContextRegistryImpl ribContextRegistry' doesn't have 'TablesKey' registered for IPv4 and IPv6 Unicast.

org.opendaylight.yang.gen.v1.urn.opendaylight.params.xml.ns.yang.bgp.types.rev130919.Ipv6AddressFamily, _safi=class org.opendaylight.yang.gen.v1.urn.opendaylight.params.xml.ns.yang.bgp.types.rev130919.UnicastSubsequentAddressFamily

org.opendaylight.yang.gen.v1.urn.opendaylight.params.xml.ns.yang.bgp.types.rev130919.Ipv4AddressFamily, _safi=class org.opendaylight.yang.gen.v1.urn.opendaylight.params.xml.ns.yang.bgp.types.rev130919.UnicastSubsequentAddressFamily

Please, let me know if do you have some tips to solve that.

I believe that some sub-module have to register those keys at the startup.

regards,

Comment by Yrineu Felipe Rodrigues [ 08/Sep/17 ]

It follows a patch that solve this issue.

https://git.opendaylight.org/gerrit/#/c/62870/2

This issue just happens when try to install those features individually instead install 'odl-bgpcep-bgp'.

That patch avoid a NPE once IPv4Address and IPv6Address Unicast AFI aren't registered at RIB context.

Comment by Claudio David Gasparini [ 11/Sep/17 ]

Adding conversation under proposed fix to bug

Claudio D. Gasparini
Sep 8 9:44 PM

Patch Set 2:
But it'll happens just if we install only the 'odl-bgpcep-rib-api'
and odl-bgpcep-rib-impl'. So, the ways to solve that are:
-Install 'odl-bgpcep-bgp' or
-Include 'inet-module' as a dependency of 'odl-bgpcep-rib-impl'What is your suggestion on that?
did you check if odl-bgpcep-rib-impl contains inet?
https://git.opendaylight.org/gerrit/gitweb?p=bgpcep.git;a=blob;f=features/bgp/odl-bgpcep-bgp-rib-impl/pom.xml;h=24def18d442697188f4467b04ba5af03a8b8ca75;hb=refs/heads/master
Yrineu Felipe Rodrigues
Sep 8 9:56 PM

Patch Set 2:
Yep, it's there, but there's no specification of module order. So, if do you manually install 'bgp-inet' before 'rib-api' and 'rib-impl', there's no NPE.
Avoid install 'rib-api' and 'rib-impl' independently is a valid option? I believe that we have to include both features into 'inet'.

Comment by Claudio David Gasparini [ 11/Sep/17 ]

Hi, some notes

Actual

2017-09-09 00:11:06,177 | INFO | pool-2-thread-1 | xtensionProviderContextActivator | 100 - org.opendaylight.bgpcep.bgp-rib-spi - 0.9.0.SNAPSHOT | Starting 0 RIBExtensionProviderActivator instances

2017-09-09 00:11:49,148 | INFO | pool-2-thread-1 | BGPActivator | 222 - org.opendaylight.bgpcep.bgp-inet - 0.9.0.SNAPSHOT | Starting Inet BGP Activator
2017-09-09 00:11:49,149 | INFO | pool-2-thread-1 | RIBActivator | 222 - org.opendaylight.bgpcep.bgp-inet - 0.9.0.SNAPSHOT | Starting Inet RIB Activator

2017-09-09 00:11:49,610 | INFO | pool-2-thread-1 | BgpDeployerImpl | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.9.0.SNAPSHOT | BGP Deployer global-bgp started.

steps

feature:install odl-bgpcep-bgp-rib-api
(wait to complete)
feature:install odl-bgpcep-bgp-rib-impl

=====================================================

Expected

2017-09-09 00:30:55,229 | INFO | pool-2-thread-1 | BGPActivator | 98 - org.opendaylight.bgpcep.bgp-inet - 0.9.0.SNAPSHOT | Starting Inet BGP Activator
2017-09-09 00:30:55,230 | INFO | pool-2-thread-1 | RIBActivator | 98 - org.opendaylight.bgpcep.bgp-inet - 0.9.0.SNAPSHOT | Starting Inet RIB Activator

2017-09-09 00:30:55,247 | INFO | rint Extender: 2 | xtensionProviderContextActivator | 105 - org.opendaylight.bgpcep.bgp-rib-spi - 0.9.0.SNAPSHOT | Starting 1 RIBExtensionProviderActivator instances

2017-09-09 00:31:38,113 | INFO | pool-2-thread-1 | BgpDeployerImpl | 228 - org.opendaylight.bgpcep.bgp-rib-impl - 0.9.0.SNAPSHOT | BGP Deployer global-bgp started.

feature:install odl-bgpcep-bgp-inet
(wait to complete)
feature:install odl-bgpcep-bgp-rib-api
(wait to complete)
feature:install odl-bgpcep-bgp-rib-impl

Conclusion:

Issue here is that SimpleRIBExtensionProviderContext under rib-spi is created and expose RIBExtensionProviderContext service before that any extension is loaded.

Fix should imply that all(not only inet) extension must be loaded before service is exposed. Otherwise any race condition combined with BGP Deployer,
could end in a RIB up which won't have support for a extension that was coming up at the same time(and this can happen without throwing any exception, therefore it will be only caught if the user is really checking it).

So we need that this steps are done in order

1 - Load all extensions
2 - Expose RIBExtensionProviderContext Service
3 - Start BGP Deployer

As you can see the fix involves more than fix the NPE, and needs more thought. I hope this can help you.

Regards,

Claudio

Comment by Robert Varga [ 25/Sep/17 ]

[‎25/09/2017 18:58:38‏] ‎<‎yrineu_rodrigues‎>‎ let me ask you... there is some way to choose the 'load order' of each module?
[‎25/09/2017 19:33:06‏] ‎<‎rovarga‎>‎ hello Yrineu
[‎25/09/2017 19:33:22‏] ‎<‎rovarga‎>‎ not sure about blueprint
[‎25/09/2017 19:34:04‏] ‎<‎rovarga‎>‎ the problem with karaf/osgi is that we do not really know when all plugins have been registered
[‎25/09/2017 19:34:51‏] ‎<‎rovarga‎>‎ I'll follow up in the issue

So the issue with registry-type things is that OSGi and karaf's feature:install approach does not really tell us when loading is complete (there is an API in 4.1.x which will help a bit), but even that is only valid until the user types feature:install.

At any case, we should be able to start loc-rib without any extensions – so that part is a bug.

Going forward, I am not sure it was implemented, but the idea was for the registry to signal when a new extension is loaded and the BGP sessions would be flapped (which is not nice, but gets the job done).

Another option is to have rib-impl check its configuration and figure out the afis/safis required for startup – before starting netty.

Comment by Yrineu Felipe Rodrigues [ 25/Sep/17 ]

@Claudio and @Robert, makes sense to create a new karaf module in order to install in a single place all contents of:
feature:install odl-bgpcep-bgp-inet
feature:install odl-bgpcep-bgp-rib-api
feature:install odl-bgpcep-bgp-rib-impl

something like: 'odl-bgpcep-bgp-extensions' that will involve all dependencies of those features above. Thus we can ensure that all extensions will be loaded and avoid install features with missing extensions.

Please, let me know what do you think about that.

Comment by Robert Varga [ 25/Sep/17 ]

I think it would be okay if bgp/rib-impl depended on bgp/inet. While inet is independent as an implementation, it is also part of RFC7460, which is what we claim to support by default.

Aside from this bgp/inet should not be special, e.g. remain a plugin outside of bgp/parser-impl.

Comment by Yrineu Felipe Rodrigues [ 26/Sep/17 ]

Hi @Robert and @Claudio,

It follows a patch with a proposed fix:
https://git.opendaylight.org/gerrit/#/c/63537/

PS: Still waiting for Jenkins

Comment by Claudio David Gasparini [ 10/Oct/17 ]

Hi Yrineu,

proposed patch for fix the NPE
https://git.opendaylight.org/gerrit/#/q/topic:BUG-8929
has been merged.

regarding you proposal to force odl-bgpcep-bgp-rib-impl to be part of odl-bgpcep-bgp-inet. Their should keep their dependency, and this is not a fixing any issue.

The issue to be fixed is that when a new extension is installed. All started session are not restarting and loading this new supported expansion.
Once fixed inet and the rest of extension will be ready to be used, no matter when or in which order they are installed.
Also this give freedom to the user to use the implementation of the extension he wants.

I'll open a bug with defined task and requirements for solve the issue previously described.

Regarding this bug, I'll close it as soon as I can check that you patch solved the NPE, if you already tested that

feature:install odl-bgpcep-bgp-rib-api
(wait to complete)
feature:install odl-bgpcep-bgp-rib-impl

doesn't longer cause the NPE, feel free to close it.

Regards,

Comment by Yrineu Felipe Rodrigues [ 11/Oct/17 ]

Thanks for your update!

Generated at Wed Feb 07 19:13:49 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.