[NETCONF-358] Fix failures in Karaf 4 feature tests Created: 02/Mar/17 Updated: 15/Mar/19 Resolved: 31/Jul/17 |
|
| Status: | Resolved |
| Project: | netconf |
| Component/s: | netconf |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Vratko Polak | Assignee: | Jakub Morvay |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| Issue Links: |
|
||||||||||||||||||||||||
| External issue ID: | 7891 | ||||||||||||||||||||||||
| Description |
|
Recent merge job, while testing features4-netconf-connector, the SingleFeaturesTest passed, but there are many ERRORs on the log [0]. I suspect this causes failure [1] in Integration/Distribution karaf patch. [0] https://logs.opendaylight.org/releng/jenkins092/netconf-merge-carbon/193/archives/features/netconf-connector/features4-netconf-connector/target/surefire-reports/org.opendaylight.odlparent.featuretest.SingleFeatureTest-output.txt.gz |
| Comments |
| Comment by Colin Dixon [ 24/Apr/17 ] |
|
Has this been fixed? Do we know? |
| Comment by Vratko Polak [ 25/Apr/17 ] |
|
Currently we do not know, the feature repos are still commented-out from Karaf 4 feature-index. |
| Comment by Vratko Polak [ 25/Apr/17 ] |
|
Oh, I have not realized that this is visible even without being tested by distribution-check. For example, considering features4-netconf-connector, in recent surefire log [2] I see "Transaction failed on 2PC, server is unhealthy", which would fail distribution-check. But I believe most of the errors come [3] from features4-config-netty of Controller project. [2] https://logs.opendaylight.org/releng/jenkins092/netconf-merge-carbon/248/archives/features/netconf-connector/features4-netconf-connector/target/surefire-reports/org.opendaylight.odlparent.featuretest.SingleFeatureTest-output.txt.gz |
| Comment by Vratko Polak [ 25/Apr/17 ] |
|
> distribution-check To be more specific: {project}-distribution4-check-carbon would be the job to prevent {project} from messing up features already included in Karaf 4 odl-integration-all. |
| Comment by Andrej Mak [ 27/Apr/17 ] |
|
"Transaction failed on 2PC, server is unhealthy" is caused by restart during feature install. Restart is triggered, when odl-mdsal-broker-local is installed in netconf karaf distribution. |
| Comment by Colin Dixon [ 27/Apr/17 ] |
|
Is this fixed? |
| Comment by Vratko Polak [ 27/Apr/17 ] |
|
Testing this was limited by slow distribution4-check jobs. [4] https://jenkins.opendaylight.org/releng/job/distribution4-check-carbon/21 |
| Comment by Vratko Polak [ 27/Apr/17 ] |
|
Actually it is worse. After The good news is that netconf devs now know what exactly is missing, so they can start fixing it. |
| Comment by Vratko Polak [ 28/Apr/17 ] |
|
Good news:
Bad news:
[5] https://git.opendaylight.org/gerrit/56036 |
| Comment by Andrej Mak [ 28/Apr/17 ] |
|
I've added netconf features to odl-integration all [0]. Distribution4-check [1] failed with following error, which is probably unrelated to restarts mentioned above. 2017-04-28 12:37:01,523 | ERROR | pool-1-thread-2 | BootFeaturesInstaller | 7 - org.apache.karaf.features.core - 4.0.9 | Error installing boot features [0] https://git.opendaylight.org/gerrit/#/c/56036/ |
| Comment by Vratko Polak [ 28/Apr/17 ] |
|
Andrej has created a patch [7], which dances around exclusion of sshd-core. [7] https://git.opendaylight.org/gerrit/56255 |
| Comment by Andrej Mak [ 03/May/17 ] |
|
ResolutionException is thrown when netconf feature is placed to featuresBoot in org.apache.karaf.features.cfg regardless of presence ssh feature. Start via console feature:install works though. |
| Comment by Andrej Mak [ 03/May/17 ] |
| Comment by Colin Dixon [ 03/May/17 ] |
|
That patch isn't passing verify: org.opendaylight.odlparent.featuretest.SingleFeatureTest |
| Comment by Andrej Mak [ 04/May/17 ] |
| Comment by Andrej Mak [ 04/May/17 ] |
|
Carbon fix was merged. However verification is blocked by |
| Comment by Andrej Mak [ 05/May/17 ] |
|
ResolutionException is not present in distribution4-check logs[9] anymore after merging [10]. Test still fails on "server is unhealthy" check, which is caused by restarting config bundles. Bundle restarting is discussed on Release mailing list[11]. [9] https://jenkins.opendaylight.org/releng/job/distribution4-check-carbon/72/console |
| Comment by Vratko Polak [ 24/Jul/17 ] |
|
This is still preventing Netconf features to included in Nitrogen Distribution. The most recent unsuccessful fix: [12]. |
| Comment by Jakub Morvay [ 28/Jul/17 ] |
|
I have just logs from failing distribution-check job on patch https://git.opendaylight.org/gerrit/#/c/59607/. Specifically the job https://jenkins.opendaylight.org/releng/job/distribution-check-nitrogen/250/. I guess the problem could be that bundle config-persister-impl is being refreshed, that interrupts the netty-threadgroup-fixed's initialization and ends in "Server is unhealthy" error being emitted. 2017-07-28 07:57:09,451 | INFO | pool-2-thread-1 | FeaturesServiceImpl | 6 - org.apache.karaf.features.core - 4.0.9 | Stopping bundles: After that we see the "Server is unhealthy" message: 2017-07-28 07:57:09,466 | ERROR | config-pusher | ConfigRegistryImpl | 100 - org.opendaylight.controller.config-manager - 0.7.0.SNAPSHOT | Configuration Transaction failed on 2PC, server is unhealthy in transaction TransactionIdentifier {name='ConfigTransaction-5-7'} at From the logs I cannot really tell what exactly caused refresh of config-persister-impl bundle. NETCONF project's features shouldn't depend on that kind of CSS stuff anymore. |
| Comment by Luis Gomez [ 29/Jul/17 ] |
|
After testing the different netconf features, I figured out the one creating the problem is odl-netconf-ssh. Once I removed the prerequesite dependencies for this feature (not sure why this is required), I do not see the problem of server unhealthy anymore: |
| Comment by Andrej Vanko [ 31/Jul/17 ] |
|
> https://git.opendaylight.org/gerrit/#/c/60915/1 https://jenkins.opendaylight.org/releng/view/integration/job/integration-multipatch-test-nitrogen/20/ |
| Comment by Vratko Polak [ 31/Jul/17 ] |
|
> https://git.opendaylight.org/gerrit/#/c/60915/1 Merged. We can also confirm that "server is unhealthy" does not happen. Marking this as FIXED. Sandbox CSIT run [17] ended with basically everything failing, but that is for other Bugs to track. [17] https://jenkins.opendaylight.org/sandbox/job/netconf-csit-1node-userfeatures-only-nitrogen/1/ |