[NETCONF-149] Beryllium RC3: some CSIT jobs aborting. IllegalStateException when loading odl-netconf-client Created: 13/Feb/16  Updated: 15/Mar/19  Resolved: 17/Feb/16

Status: Verified
Project: netconf
Component/s: netconf
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Jamo Luhrsen Assignee: Tomas Cere
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: File 5346.karaf.log.gz    
External issue ID: 5346

 Description   

https://jenkins.opendaylight.org/releng/job/bgpcep-csit-1node-userfeatures-only-beryllium/1085/
https://jenkins.opendaylight.org/releng/job/capwap-csit-1node-ac-only-beryllium/649
https://jenkins.opendaylight.org/releng/job/netconf-csit-1node-userfeatures-only-beryllium/201/
https://jenkins.opendaylight.org/releng/job/usc-csit-1node-channel-only-beryllium/926/

trace from log (full log of one example attached)

2016-02-13 13:44:54,516 | ERROR | config-pusher | ConfigPusherImpl | 108 - org.opendaylight.controller.config-persister-impl - 0.4.0.Beryllium | Failed to apply configuration snapshot: 01-netconf.xml(odl-netconf-client,odl-netconf-client)
java.lang.IllegalStateException: Unable to resolve value AttributeConfigElement [defaultValue=null, value=MappedDependency

{namespace='urn:opendaylight:params:xml:ns:yang:controller:netty', serviceName='netty-threadgroup', refName='global-worker-group'}

] to attribute worker-thread-group
at org.opendaylight.controller.config.facade.xml.mapping.config.InstanceConfig.resolveConfiguration(InstanceConfig.java:131)[106:org.opendaylight.controller.config-manager-facade-xml:0.4.0.Beryllium]
at org.opendaylight.controller.config.facade.xml.mapping.config.InstanceConfig.fromXml(InstanceConfig.java:192)[106:org.opendaylight.controller.config-manager-facade-xml:0.4.0.Beryllium]
at org.opendaylight.controller.config.facade.xml.mapping.config.ModuleConfig.fromXml(ModuleConfig.java:59)[106:org.opendaylight.controller.config-manager-facade-xml:0.4.0.Beryllium]
at org.opendaylight.controller.config.facade.xml.mapping.config.Config$1.resolveElement(Config.java:147)[106:org.opendaylight.controller.config-manager-facade-xml:0.4.0.Beryllium]
at org.opendaylight.controller.config.facade.xml.mapping.config.Config$1.resolveElement(Config.java:144)[106:org.opendaylight.controller.config-manager-facade-xml:0.4.0.Beryllium]
at org.opendaylight.controller.config.facade.xml.mapping.config.Config.resolveModule(Config.java:224)[106:org.opendaylight.controller.config-manager-facade-xml:0.4.0.Beryllium]
at org.opendaylight.controller.config.facade.xml.mapping.config.Config.fromXmlModulesResolved(Config.java:152)[106:org.opendaylight.controller.config-manager-facade-xml:0.4.0.Beryllium]
at org.opendaylight.controller.config.facade.xml.ConfigExecution.getResolvedXmlElements(ConfigExecution.java:49)[106:org.opendaylight.controller.config-manager-facade-xml:0.4.0.Beryllium]
at org.opendaylight.controller.config.facade.xml.ConfigSubsystemFacade.setOnTransaction(ConfigSubsystemFacade.java:228)[106:org.opendaylight.controller.config-manager-facade-xml:0.4.0.Beryllium]
at org.opendaylight.controller.config.facade.xml.ConfigSubsystemFacade.test(ConfigSubsystemFacade.java:165)[106:org.opendaylight.controller.config-manager-facade-xml:0.4.0.Beryllium]
at org.opendaylight.controller.config.facade.xml.ConfigSubsystemFacade.executeTests(ConfigSubsystemFacade.java:149)[106:org.opendaylight.controller.config-manager-facade-xml:0.4.0.Beryllium]
at org.opendaylight.controller.config.facade.xml.ConfigSubsystemFacade.executeConfigExecution(ConfigSubsystemFacade.java:123)[106:org.opendaylight.controller.config-manager-facade-xml:0.4.0.Beryllium]
at org.opendaylight.controller.config.persist.impl.ConfigPusherImpl.executeWithMissingModuleFactoryRetries(ConfigPusherImpl.java:329)[108:org.opendaylight.controller.config-persister-impl:0.4.0.Beryllium]
at org.opendaylight.controller.config.persist.impl.ConfigPusherImpl.pushConfig(ConfigPusherImpl.java:305)[108:org.opendaylight.controller.config-persister-impl:0.4.0.Beryllium]
at org.opendaylight.controller.config.persist.impl.ConfigPusherImpl.pushConfigWithConflictingVersionRetries(ConfigPusherImpl.java:164)[108:org.opendaylight.controller.config-persister-impl:0.4.0.Beryllium]
at org.opendaylight.controller.config.persist.impl.ConfigPusherImpl.internalPushConfigs(ConfigPusherImpl.java:129)[108:org.opendaylight.controller.config-persister-impl:0.4.0.Beryllium]
at org.opendaylight.controller.config.persist.impl.ConfigPusherImpl.processSingle(ConfigPusherImpl.java:83)[108:org.opendaylight.controller.config-persister-impl:0.4.0.Beryllium]
at org.opendaylight.controller.config.persist.impl.ConfigPusherImpl.process(ConfigPusherImpl.java:74)[108:org.opendaylight.controller.config-persister-impl:0.4.0.Beryllium]
at org.opendaylight.controller.config.persist.impl.osgi.ConfigPersisterActivator$2.run(ConfigPersisterActivator.java:131)[108:org.opendaylight.controller.config-persister-impl:0.4.0.Beryllium]
at java.lang.Thread.run(Thread.java:745)[:1.7.0_85]
Caused by: java.lang.IllegalArgumentException: No service mapped to urn:opendaylight:params:xml:ns:yang:controller:netty:netty-threadgroup:global-worker-group. Wrong namespace, available namespaces: [urn:opendaylight:params:xml:ns:yang:controller:threadpool, urn:opendaylight:params:xml:ns:yang:controller:netconf:topology:shared:schema:repository, urn:opendaylight:params:xml:ns:yang:controller:shutdown, urn:opendaylight:params:xml:ns:yang:controller:config:netconf]
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:145)[51:com.google.guava:18.0.0]
at org.opendaylight.controller.config.facade.xml.mapping.config.ServiceRegistryWrapper.getByServiceAndRefName(ServiceRegistryWrapper.java:32)[106:org.opendaylight.controller.config-manager-facade-xml:0.4.0.Beryllium]
at org.opendaylight.controller.config.facade.xml.mapping.attributes.resolving.ObjectNameAttributeResolvingStrategy.parseAttribute(ObjectNameAttributeResolvingStrategy.java:44)[106:org.opendaylight.controller.config-manager-facade-xml:0.4.0.Beryllium]
at org.opendaylight.controller.config.facade.xml.mapping.attributes.fromxml.AttributeConfigElement.resolveValue(AttributeConfigElement.java:46)[106:org.opendaylight.controller.config-manager-facade-xml:0.4.0.Beryllium]
at org.opendaylight.controller.config.facade.xml.mapping.config.InstanceConfig.resolveConfiguration(InstanceConfig.java:127)[106:org.opendaylight.controller.config-manager-facade-xml:0.4.0.Beryllium]
... 19 more



 Comments   
Comment by Jamo Luhrsen [ 13/Feb/16 ]

sorry, I do not have a full karaf.log. The console output from one of jobs is attached which gives the first and last 500K bytes. Note, there are a ton of AuthenticationExceptions, but those are expected as it's our polling on restconf by our bringup script

Comment by Jamo Luhrsen [ 13/Feb/16 ]

Attachment 5346.karaf.log.gz has been added with description: first and last 500K bytes of karaf.log

Comment by Vratko Polak [ 15/Feb/16 ]

These failures do not happen with Beryllium snapshot builds, but they do not happen when testing RC3 on Sandbox [0] either.

Common denominator is netconf connector. The jobs either install odl-netconf-connector-all (in job definition), or they install odl-netconf-connector (via odl-usc-channel).

The error message mentions odl-netconf-client feature, while global-worker-group is defined in 00-netty.xml config file of odl-config-netty feature (which is not listed as dependency of odl-netconf-client).

Probably, the 00-netty.xml is pulled in by some other feature being installed, and it depends on system performance whether ODL detects the transient inconsistency or not.

Change 34675 [1] contributed to address the dependency issue, but there may be more hidden issues of this type, see comments of ODLPARENT-12.

[0] https://jenkins.opendaylight.org/sandbox/job/netconf-csit-1node-userfeatures-only-beryllium/1
[1] https://git.opendaylight.org/gerrit/34675

Comment by Colin Dixon [ 15/Feb/16 ]

Thanks Vratko!

Is this related?

> 2016-02-13 13:44:49,924 | INFO | Event Dispatcher | YangTextSchemaContextResolver | 69 - org.opendaylight.yangtools.yang-parser-impl - 0.8.0.Beryllium | Provided module name /META-INF/yang/threadpool.yang@0000-00-00.yang does not match actual text threadpool@2013-04-09.yang, corrected

Comment by Vratko Polak [ 15/Feb/16 ]

> https://git.opendaylight.org/gerrit/34675

https://jenkins.opendaylight.org/releng/view/netconf/job/netconf-csit-1node-userfeatures-only-beryllium/206
is queued and it shall verify this Change does not break snapshot builds.

But I do not see an easy way to verify its supposed effect on autorelease builds, without actually making new autorelease build with the Change merged.

Comment by Vratko Polak [ 15/Feb/16 ]

> Is this related?

No, that is a warning from new Yangtools parser, after the automatic source name conversion everything works.

Comment by Jamo Luhrsen [ 15/Feb/16 ]

This means we are re-spinning the release right?

there are other "ready-to-submit" patches out there as well.

Comment by Jamo Luhrsen [ 15/Feb/16 ]

This means we are re-spinning the release right?

there are other "ready-to-submit" patches out there as well.

Comment by Vratko Polak [ 15/Feb/16 ]

> we are re-spinning the release right?

+1

Comment by A H [ 15/Feb/16 ]

Severity: Could you elaborate on the severity of this bug? Is this a BLOCKER such that we cannot release Beryllium without it? Is there a workaround such that we can write a release note and fix in Beryllium SR1?
Testing: Could you also elaborate on the testing or this patch? How extensively has this patch been tested? Is it covered by any unit tests or system tests?
Impact: Does this fix impact any dependent projects?

Comment by Jamo Luhrsen [ 15/Feb/16 ]

(In reply to A H from comment #10)
> Severity: Could you elaborate on the severity of this bug? Is this a
> BLOCKER such that we cannot release Beryllium without it? Is there a
> workaround such that we can write a release note and fix in Beryllium SR1?

I don't know of any workaround, but I didn't really look. IMO this is a
release blocker as it prevents restconf from coming up when specific
user facing features are installed.

> Testing: Could you also elaborate on the testing or this patch? How
> extensively has this patch been tested? Is it covered by any unit tests or
> system tests?

There are the four system test jobs (initial comment) that will give us
good confidence that the patch gets us past this bug.

> Impact: Does this fix impact any dependent projects?

not sure. Hopefully Vratko can add more info here.

Comment by A H [ 15/Feb/16 ]

(In reply to Jamo Luhrsen from comment #11)
> (In reply to A H from comment #10)
> > Severity: Could you elaborate on the severity of this bug? Is this a
> > BLOCKER such that we cannot release Beryllium without it? Is there a
> > workaround such that we can write a release note and fix in Beryllium SR1?
>
> I don't know of any workaround, but I didn't really look. IMO this is a
> release blocker as it prevents restconf from coming up when specific
> user facing features are installed.
>
> > Testing: Could you also elaborate on the testing or this patch? How
> > extensively has this patch been tested? Is it covered by any unit tests or
> > system tests?
>
> There are the four system test jobs (initial comment) that will give us
> good confidence that the patch gets us past this bug.

I would also agree that the four CSIT that failed would cover the testing aspects of this patch.

> > Impact: Does this fix impact any dependent projects?
>
> not sure. Hopefully Vratko can add more info here.

This fix would impact projects whose CSIT jobs failed originally. We will know more once we rerun these jobs after merging the fix.

Comment by Vratko Polak [ 15/Feb/16 ]

> Could you elaborate on the severity of this bug?

When user installs a feature related to netconf-connector, ODL may break so that even basic Restconf does not work. This is a blocker unless a workaround is applied.

> How extensively has this patch been tested?

This Bug is not reproducible on Sandbox or on Snapshot builds, probably because a timing issue. The Bug was reproduced on Releng with original autorelease build.
The fix was only proven to not make things worse. Even when it improves the code, next respin may fail on similar issue in other parts of code.

> Is there a workaround such that we can write a release note and fix in Beryllium SR1?

If user installs odl-config-netty feature prior anything related to netconf-connector, ODL should work.
Due to difficulties with reproducing this Bug, it is not clear whether this workaround indeed works.

> Is it covered by any unit tests or system tests?

No specific system tests, apart the 4 csit jobs that were reported.
No unit tests for this Bug exist right now. Comments in ODLPARENT-12 may enable creation of such unit tests, but probably not before Beryllium Release.

> Does this fix impact any dependent projects?

No, projects were already expecting netconf-connector to install all its dependencies.

Comment by Vratko Polak [ 16/Feb/16 ]

The bgpcep job triggered with autorelease URL on RelEng Jenkins works again [0].
Setting to FIXED.

[0] https://jenkins.opendaylight.org/releng/job/bgpcep-csit-1node-userfeatures-only-beryllium/1094/console

Comment by A H [ 17/Feb/16 ]

Have we been able to verify the fix for this bug?

Comment by Jamo Luhrsen [ 17/Feb/16 ]

(In reply to A H from comment #15)
> Have we been able to verify the fix for this bug?

using the CSIT jobs with RC3 we can mark this verified.

Generated at Wed Feb 07 20:14:17 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.