[NETCONF-358] Fix failures in Karaf 4 feature tests Created: 02/Mar/17  Updated: 15/Mar/19  Resolved: 31/Jul/17

Status: Resolved
Project: netconf
Component/s: netconf
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Vratko Polak Assignee: Jakub Morvay
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issue Links:
Blocks
blocks NETCONF-412 Feature operations at Karaf4 trigger ... Resolved
is blocked by ODLPARENT-90 Karaf 4 distribution unusable after b... Resolved
is blocked by AAA-126 Unable to start blueprint container f... Verified
Duplicate
is duplicated by NETCONF-414 odl-netconf-mdsal feature does not wo... Resolved
External issue ID: 7891

 Description   

Recent merge job, while testing features4-netconf-connector, the SingleFeaturesTest passed, but there are many ERRORs on the log [0].

I suspect this causes failure [1] in Integration/Distribution karaf patch.

[0] https://logs.opendaylight.org/releng/jenkins092/netconf-merge-carbon/193/archives/features/netconf-connector/features4-netconf-connector/target/surefire-reports/org.opendaylight.odlparent.featuretest.SingleFeatureTest-output.txt.gz
[1] https://jenkins.opendaylight.org/releng/job/distribution-verify-carbon/org.opendaylight.integration$features-index/37/testReport/junit/installFeature(org.opendaylight.odlparent.featuretest.SingleFeatureTest)[repoUrl_%20file__w_workspace_distribution-verify-carbon_features_repos_index_target_feature_feature.xml,%20Feature_%20features4-netconf-connector%201.2.0/SNAPSHOT]/installFeature/



 Comments   
Comment by Colin Dixon [ 24/Apr/17 ]

Has this been fixed? Do we know?

Comment by Vratko Polak [ 25/Apr/17 ]

Currently we do not know, the feature repos are still commented-out from Karaf 4 feature-index.

Comment by Vratko Polak [ 25/Apr/17 ]

Oh, I have not realized that this is visible even without being tested by distribution-check.

For example, considering features4-netconf-connector, in recent surefire log [2] I see "Transaction failed on 2PC, server is unhealthy", which would fail distribution-check.

But I believe most of the errors come [3] from features4-config-netty of Controller project.
After fixing odl-config-netty, other faulty dependencies might become visible.

[2] https://logs.opendaylight.org/releng/jenkins092/netconf-merge-carbon/248/archives/features/netconf-connector/features4-netconf-connector/target/surefire-reports/org.opendaylight.odlparent.featuretest.SingleFeatureTest-output.txt.gz
[3] https://logs.opendaylight.org/releng/jenkins092/controller-merge-carbon/428/archives/features/config-netty/features4-config-netty/target/surefire-reports/org.opendaylight.odlparent.featuretest.SingleFeatureTest-output.txt.gz

Comment by Vratko Polak [ 25/Apr/17 ]

> distribution-check

To be more specific:

{project}-distribution4-check-carbon would be the job to prevent {project}

from messing up features already included in Karaf 4 odl-integration-all.
distribution4-check-carbon is the job which should reject adding faulty features to Karaf 4 odl-integration-all.

Comment by Andrej Mak [ 27/Apr/17 ]

"Transaction failed on 2PC, server is unhealthy" is caused by restart during feature install. Restart is triggered, when odl-mdsal-broker-local is installed in netconf karaf distribution.
However, after this restart, features seem to be working well.

Comment by Colin Dixon [ 27/Apr/17 ]

Is this fixed?

Comment by Vratko Polak [ 27/Apr/17 ]

Testing this was limited by slow distribution4-check jobs.
We will know more when this [4] one finishes.

[4] https://jenkins.opendaylight.org/releng/job/distribution4-check-carbon/21

Comment by Vratko Polak [ 27/Apr/17 ]

Actually it is worse. After ODLPARENT-75 netconf merge job shows SFT failures on multiple features. Similar to NETCONF-411.

The good news is that netconf devs now know what exactly is missing, so they can start fixing it.

Comment by Vratko Polak [ 28/Apr/17 ]

Good news:

  • Netconf merge job is passing.
  • Netconf distribution4-check is passing (but no non-restconf netconf feature in distribution).
  • Change to add netconf feature repos [5] passed its verify jobs.

Bad news:

  • I still see errors in surefire reports.
    • For example "server is unhealthy" [6] would very probably fail distribution4-check if the feature is added to odl-integration-all.

[5] https://git.opendaylight.org/gerrit/56036
[6] https://logs.opendaylight.org/releng/jenkins092/netconf-merge-carbon/254/archives/features/netconf-connector/odl-netconf-clustered-topology/target/surefire-reports/org.opendaylight.odlparent.featuretest.SingleFeatureTest-output.txt.gz

Comment by Andrej Mak [ 28/Apr/17 ]

I've added netconf features to odl-integration all [0]. Distribution4-check [1] failed with following error, which is probably unrelated to restarts mentioned above.

2017-04-28 12:37:01,523 | ERROR | pool-1-thread-2 | BootFeaturesInstaller | 7 - org.apache.karaf.features.core - 4.0.9 | Error installing boot features
org.osgi.service.resolver.ResolutionException: Unable to resolve root: missing requirement [root] osgi.identity; osgi.identity=odl-netconf-tcp; type=karaf.feature; version=1.2.0.SNAPSHOT; filter:="(&(osgi.identity=odl-netconf-tcp)(type=karaf.feature)(version>=1.2.0.SNAPSHOT))" [caused by: Unable to resolve odl-netconf-tcp/1.2.0.SNAPSHOT: missing requirement [odl-netconf-tcp/1.2.0.SNAPSHOT] osgi.identity; osgi.identity=odl-netconf-impl; type=karaf.feature; version="[1.2.0.SNAPSHOT,1.2.0.SNAPSHOT]" [caused by: Unable to resolve odl-netconf-impl/1.2.0.SNAPSHOT: missing requirement [odl-netconf-impl/1.2.0.SNAPSHOT] osgi.identity; osgi.identity=odl-netconf-netty-util; type=karaf.feature; version="[1.2.0.SNAPSHOT,1.2.0.SNAPSHOT]" [caused by: Unable to resolve odl-netconf-netty-util/1.2.0.SNAPSHOT: missing requirement [odl-netconf-netty-util/1.2.0.SNAPSHOT] osgi.identity; osgi.identity=org.opendaylight.netconf.netty-util; type=osgi.bundle; version="[1.2.0.SNAPSHOT,1.2.0.SNAPSHOT]"; resolution:=mandatory [caused by: Unable to resolve org.opendaylight.netconf.netty-util/1.2.0.SNAPSHOT: missing requirement [org.opendaylight.netconf.netty-util/1.2.0.SNAPSHOT] osgi.wiring.package; filter:="(&(osgi.wiring.package=org.apache.sshd)(version>=0.14.0)(!(version>=1.0.0)))"]]]]

[0] https://git.opendaylight.org/gerrit/#/c/56036/
[1] https://logs.opendaylight.org/releng/jenkins092/distribution4-check-carbon/31/console.log.gz

Comment by Vratko Polak [ 28/Apr/17 ]

Andrej has created a patch [7], which dances around exclusion of sshd-core.
So perhaps making sure excluded artifacts are installed in "empty" karaf is needed: [8].

[7] https://git.opendaylight.org/gerrit/56255
[8] https://git.opendaylight.org/gerrit/56262

Comment by Andrej Mak [ 03/May/17 ]

ResolutionException is thrown when netconf feature is placed to featuresBoot in org.apache.karaf.features.cfg regardless of presence ssh feature. Start via console feature:install works though.

Comment by Andrej Mak [ 03/May/17 ]

https://git.opendaylight.org/gerrit/#/c/56421/

Comment by Colin Dixon [ 03/May/17 ]

That patch isn't passing verify:

org.opendaylight.odlparent.featuretest.SingleFeatureTest
10:22:18 installFeatureCatchAndLog(org.opendaylight.odlparent.featuretest.SingleFeatureTest)[repoUrl: file:/w/workspace/netconf-verify-nitrogen-mvn33-openjdk8/features/netconf-connector/features4-netconf-connector/target/feature/feature.xml, Feature: features4-netconf-connector 1.3.0.SNAPSHOT] Time elapsed: 58.96 sec <<< ERROR!
10:22:18 org.opendaylight.yangtools.yang.data.api.schema.tree.ConflictingModificationAppliedException: Node was replaced by other transaction.
10:22:18 at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkConflicting(SchemaAwareApplyOperation.java:78)

Comment by Andrej Mak [ 04/May/17 ]

carbon: https://git.opendaylight.org/gerrit/#/c/56460/

Comment by Andrej Mak [ 04/May/17 ]

Carbon fix was merged. However verification is blocked by AAA-126 since it causes failures in distribution4-check.
https://jenkins.opendaylight.org/releng/job/distribution4-check-carbon/66/console

Comment by Andrej Mak [ 05/May/17 ]

ResolutionException is not present in distribution4-check logs[9] anymore after merging [10]. Test still fails on "server is unhealthy" check, which is caused by restarting config bundles. Bundle restarting is discussed on Release mailing list[11].

[9] https://jenkins.opendaylight.org/releng/job/distribution4-check-carbon/72/console
[10] https://git.opendaylight.org/gerrit/#/c/56460/
[11] https://lists.opendaylight.org/pipermail/release/2017-May/010653.html

Comment by Vratko Polak [ 24/Jul/17 ]

This is still preventing Netconf features to included in Nitrogen Distribution.

The most recent unsuccessful fix: [12].

[12] https://git.opendaylight.org/gerrit/60669

Comment by Jakub Morvay [ 28/Jul/17 ]

I have just logs from failing distribution-check job on patch https://git.opendaylight.org/gerrit/#/c/59607/. Specifically the job https://jenkins.opendaylight.org/releng/job/distribution-check-nitrogen/250/.

I guess the problem could be that bundle config-persister-impl is being refreshed, that interrupts the netty-threadgroup-fixed's initialization and ends in "Server is unhealthy" error being emitted.

2017-07-28 07:57:09,451 | INFO | pool-2-thread-1 | FeaturesServiceImpl | 6 - org.apache.karaf.features.core - 4.0.9 | Stopping bundles:
.
.
2017-07-28 07:57:09,454 | INFO | pool-2-thread-1 | FeaturesServiceImpl | 6 - org.apache.karaf.features.core - 4.0.9 | org.opendaylight.controller.config-persister-impl/0.7.0.SNAPSHOT

After that we see the "Server is unhealthy" message:

2017-07-28 07:57:09,466 | ERROR | config-pusher | ConfigRegistryImpl | 100 - org.opendaylight.controller.config-manager - 0.7.0.SNAPSHOT | Configuration Transaction failed on 2PC, server is unhealthy
java.lang.IllegalStateException: Error - getInstance() failed for ModuleIdentifier

{factoryName='netty-threadgroup-fixed', instanceName='global-worker-group'}

in transaction TransactionIdentifier

{name='ConfigTransaction-5-7'}

at
...
at org.opendaylight.controller.config.persist.impl.ConfigPusherImpl.pushConfig(ConfigPusherImpl.java:291)[105:org.opendaylight.controller.config-persister-impl:0.7.0.SNAPSHOT]
at org.opendaylight.controller.config.persist.impl.ConfigPusherImpl.pushConfigWithConflictingVersionRetries(ConfigPusherImpl.java:159)[105:org.opendaylight.controller.config-persister-impl:0.7.0.SNAPSHOT]
at org.opendaylight.controller.config.persist.impl.ConfigPusherImpl.internalPushConfigs(ConfigPusherImpl.java:124)[105:org.opendaylight.controller.config-persister-impl:0.7.0.SNAPSHOT]
at org.opendaylight.controller.config.persist.impl.ConfigPusherImpl.processSingle(ConfigPusherImpl.java:78)[105:org.opendaylight.controller.config-persister-impl:0.7.0.SNAPSHOT]
at org.opendaylight.controller.config.persist.impl.ConfigPusherImpl.process(ConfigPusherImpl.java:70)[105:org.opendaylight.controller.config-persister-impl:0.7.0.SNAPSHOT]
at org.opendaylight.controller.config.persist.impl.osgi.ConfigPersisterActivator$2.run(ConfigPersisterActivator.java:131)[105:org.opendaylight.controller.config-persister-impl:0.7.0.SNAPSHOT]
at java.lang.Thread.run(Thread.java:748)[:1.8.0_131]
Caused by: org.opendaylight.controller.config.api.osgi.ServiceNotFoundException: Wait for OSGi service interface io.netty.channel.EventLoopGroup was interrrupted
at org.opendaylight.controller.config.api.osgi.WaitingServiceTracker.waitForService(WaitingServiceTracker.java:54)
at org.opendaylight.controller.config.yang.netty.threadgroup.NettyThreadgroupModule.createInstance(NettyThreadgroupModule.java:58)
at org.opendaylight.controller.config.spi.AbstractModule.getInstance(AbstractModule.java:87)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)[:1.8.0_131]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)[:1.8.0_131]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)[:1.8.0_131]
at java.lang.reflect.Method.invoke(Method.java:498)[:1.8.0_131]
at org.opendaylight.controller.config.manager.impl.dependencyresolver.DependencyResolverManager$ModuleInvocationHandler.handleInvocation(DependencyResolverManager.java:148)
at com.google.common.reflect.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:84)
at com.sun.proxy.$Proxy39.getInstance(Unknown Source)
at org.opendaylight.controller.config.manager.impl.ConfigTransactionControllerImpl.secondPhaseCommit(ConfigTransactionControllerImpl.java:439)
... 35 more

From the logs I cannot really tell what exactly caused refresh of config-persister-impl bundle. NETCONF project's features shouldn't depend on that kind of CSS stuff anymore.

Comment by Luis Gomez [ 29/Jul/17 ]

After testing the different netconf features, I figured out the one creating the problem is odl-netconf-ssh. Once I removed the prerequesite dependencies for this feature (not sure why this is required), I do not see the problem of server unhealthy anymore:

https://git.opendaylight.org/gerrit/#/c/60915/

Comment by Andrej Vanko [ 31/Jul/17 ]

> https://git.opendaylight.org/gerrit/#/c/60915/1

https://jenkins.opendaylight.org/releng/view/integration/job/integration-multipatch-test-nitrogen/20/
is building a zip to use in Sandbox.

Comment by Vratko Polak [ 31/Jul/17 ]

> https://git.opendaylight.org/gerrit/#/c/60915/1

Merged. We can also confirm that "server is unhealthy" does not happen. Marking this as FIXED.

Sandbox CSIT run [17] ended with basically everything failing, but that is for other Bugs to track.

[17] https://jenkins.opendaylight.org/sandbox/job/netconf-csit-1node-userfeatures-only-nitrogen/1/

Generated at Wed Feb 07 20:14:48 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.