[CONTROLLER-1499] SR1 candidate fails when booting with features Created: 17/Mar/16 Updated: 19/Oct/17 Resolved: 22/Mar/16 |
|
| Status: | Resolved |
| Project: | controller |
| Component/s: | config |
| Affects Version/s: | Beryllium |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Vratko Polak | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| Attachments: |
|
| External issue ID: | 5546 |
| Description |
|
The following error seems to happen when config file is pushed, but only for features in featuresBoot, not when feature:install is issued at runtime. 2016-03-16 04:48:42,313 | ERROR | config-pusher | ConfigPusherImpl | 108 - org.opendaylight.controller.config-persister-impl - 0.4.1.Beryllium-SR1 | Failed to apply configuration snapshot: 00-netty.xml(odl-mdsal-broker,odl-mdsal-broker) Also, this error does not happen on Beryllium snapshot builds. Attached TRACE logs when odl-config-netty is the feature present in featuresBoot. |
| Comments |
| Comment by Vratko Polak [ 17/Mar/16 ] |
|
Attachment karaf_log_sr1_2016317.tar.xz has been added with description: Archive with trace logs on SR1 |
| Comment by Vratko Polak [ 17/Mar/16 ] |
|
Also attached trace logs from Beryllium snapshot build. I did not spot any useful difference yet. Here is full error from SR1: |
| Comment by Vratko Polak [ 17/Mar/16 ] |
|
Attachment karaf_log_snapshot_20160317.tar.xz has been added with description: Archive with trace logs on Beryllium snapshot |
| Comment by Vratko Polak [ 17/Mar/16 ] |
|
As Ryan Goulding points out, symptoms are similar to |
| Comment by Vratko Polak [ 17/Mar/16 ] |
|
Still present on today's autorelease build. |
| Comment by Vratko Polak [ 17/Mar/16 ] |
|
This Bug is also present on Boron autorelease build. Not on Boron snapshot build, but last such distribution build is quite old (2016-03-13). |
| Comment by Vratko Polak [ 17/Mar/16 ] |
|
I tried to reproduce on Beryllium snapshot codebase by using multipatch job [0] on Sandbox. It built odlparent, yangtools, mdsal, controller, aaa and netconf, other parts of integration/distribution were downloaded from Nexus. It seems like root of this error is in autorelease. I have compared jars in system/ between SR1 and snapshot. All non-ODL artifacts were the same, all ODL artifacts were different (except org/opendaylight/controller/thirdparty/net.sf.jung2/2.0.1/net.sf.jung2-2.0.1.jar and org/opendaylight/controller/sal/0.7.0/sal-0.7.0.jar which I guess is correct). Perhaps the yang parser issue with not enough memory hits only on huge builds (explaining why docs build is faster when moved to front) and it can create (somehow) a broken build? [0] https://jenkins.opendaylight.org/sandbox/job/integration-multipatch-test-beryllium/1/ |
| Comment by Thanh Ha (zxiiro) [ 17/Mar/16 ] |
|
If I want to reproduce locally what are the steps to make this happen? |
| Comment by Thanh Ha (zxiiro) [ 17/Mar/16 ] |
|
(In reply to Thanh Ha from comment #6) Figured it out, at minimum if you add to your etc/org.apache.karaf.features.cfg: featuresBoot=config,standard,region,package,kar,ssh,management,odl-integration-compatible-with-all,odl-restconf Should reproduce the issue. |
| Comment by Thanh Ha (zxiiro) [ 17/Mar/16 ] |
|
(In reply to Vratko Polák from comment #5) You might be on to something. I tried to do a minimal local build of autorelease building only odlparent, yangtools, mdsal, controller, aaa. I couldn't reproduce the problem either. Bizarre. |
| Comment by Luis Gomez [ 18/Mar/16 ] |
|
Thanh, you can neglect odl-integration-compatible-with-all, odl-restconf is just enough to reproduce the issue. |
| Comment by Thanh Ha (zxiiro) [ 18/Mar/16 ] |
|
To test the theory that autorelease is too big I created a test job that splits the build into 3 parts. offset0, offset1, and offset2 |
| Comment by Vratko Polak [ 18/Mar/16 ] |
|
> autorelease-release-test-beryllium/2/console That run did not ignore unit test failures, so we do not have resulting .zip. |
| Comment by Thanh Ha (zxiiro) [ 18/Mar/16 ] |
|
JJB's supposed to ignore manually created jobs. I guess there's a bug somewhere because builder removed the job I'll submit a patch to build to add the test jobs. (In reply to Vratko Polák from comment #11) I can add another offset profile to split it more. Did you see it fail? I was asleep during the run so I wasn't able to watch it. |
| Comment by Vratko Polak [ 18/Mar/16 ] |
|
Yes, it failed on a unit test failure. I am 80% sure it was the exact repeat of this one: https://jenkins.opendaylight.org/releng/view/autorelease/job/autorelease-release-beryllium/org.opendaylight.groupbasedpolicy$ofoverlay-renderer/90/testReport/junit/org.opendaylight.groupbasedpolicy.renderer.ofoverlay.arp/ArpSenderTest/floodArpTest/ |
| Comment by Thanh Ha (zxiiro) [ 18/Mar/16 ] |
|
Interesting my other attempt, build #96 appears to not have the test failures we saw in build #90 https://jenkins.opendaylight.org/releng/job/autorelease-release-beryllium/96/ I did 2 changes for this run. 1) Revert my 10g patch https://git.opendaylight.org/gerrit/36208 The build is still running though so lets see if it fails with the same issues. |
| Comment by A H [ 18/Mar/16 ] |
|
This is looking very promising. So far aaa, capwap, etc have their tests passing. The fix seems to be working. |
| Comment by Vratko Polak [ 22/Mar/16 ] |
|
This does not happen on recent autorelease builds anymore, marking as fixed. |