[CONTROLLER-1499] SR1 candidate fails when booting with features Created: 17/Mar/16  Updated: 19/Oct/17  Resolved: 22/Mar/16

Status: Resolved
Project: controller
Component/s: config
Affects Version/s: Beryllium
Fix Version/s: None

Type: Bug
Reporter: Vratko Polak Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: File karaf_log_snapshot_20160317.tar.xz     File karaf_log_sr1_2016317.tar.xz    
External issue ID: 5546

 Description   

The following error seems to happen when config file is pushed, but only for features in featuresBoot, not when feature:install is issued at runtime.

2016-03-16 04:48:42,313 | ERROR | config-pusher | ConfigPusherImpl | 108 - org.opendaylight.controller.config-persister-impl - 0.4.1.Beryllium-SR1 | Failed to apply configuration snapshot: 00-netty.xml(odl-mdsal-broker,odl-mdsal-broker)
java.lang.SecurityException: Insufficient roles/credentials for operation
at org.apache.karaf.management.KarafMBeanServerGuard.handleInvoke(KarafMBeanServerGuard.java:289)[235:org.apache.karaf.management.server:3.0.3]

Also, this error does not happen on Beryllium snapshot builds. Attached TRACE logs when odl-config-netty is the feature present in featuresBoot.



 Comments   
Comment by Vratko Polak [ 17/Mar/16 ]

Attachment karaf_log_sr1_2016317.tar.xz has been added with description: Archive with trace logs on SR1

Comment by Vratko Polak [ 17/Mar/16 ]

Also attached trace logs from Beryllium snapshot build. I did not spot any useful difference yet.

Here is full error from SR1:
2016-03-17 08:52:53,936 | ERROR | config-pusher | ConfigPusherImpl | 98 - org.opendaylight.controller.config-persister-impl - 0.4.1.Beryllium-SR1 | Failed to apply configuration snapshot: 00-netty.xml(odl-config-netty,odl-config-netty)
java.lang.SecurityException: Insufficient roles/credentials for operation
at org.apache.karaf.management.KarafMBeanServerGuard.handleInvoke(KarafMBeanServerGuard.java:289)[108:org.apache.karaf.management.server:3.0.3]
at org.apache.karaf.management.KarafMBeanServerGuard.invoke(KarafMBeanServerGuard.java:85)[108:org.apache.karaf.management.server:3.0.3]
at org.apache.karaf.management.boot.KarafMBeanServerBuilder$MBeanInvocationHandler.invoke(KarafMBeanServerBuilder.java:63)[karaf-jmx-boot.jar:]
at com.sun.proxy.$Proxy0.invoke(Unknown Source)[:]
at com.sun.jmx.mbeanserver.MXBeanProxy$InvokeHandler.invoke(Unknown Source)[:1.7.0_67]
at com.sun.jmx.mbeanserver.MXBeanProxy.invoke(Unknown Source)[:1.7.0_67]
at javax.management.MBeanServerInvocationHandler.invoke(Unknown Source)[:1.7.0_67]
at com.sun.proxy.$Proxy20.beginConfig(Unknown Source)[66:org.opendaylight.controller.config-api:0.4.1.Beryllium-SR1]
at org.opendaylight.controller.config.util.ConfigRegistryJMXClient.beginConfig(ConfigRegistryJMXClient.java:104)[90:org.opendaylight.controller.config-util:0.4.1.Beryllium-SR1]
at org.opendaylight.controller.config.facade.xml.transactions.TransactionProvider.getTestTransaction(TransactionProvider.java:117)[96:org.opendaylight.controller.config-manager-facade-xml:0.4.1.Beryllium-SR1]
at org.opendaylight.controller.config.facade.xml.ConfigSubsystemFacade.test(ConfigSubsystemFacade.java:154)[96:org.opendaylight.controller.config-manager-facade-xml:0.4.1.Beryllium-SR1]
at org.opendaylight.controller.config.facade.xml.ConfigSubsystemFacade.executeTests(ConfigSubsystemFacade.java:149)[96:org.opendaylight.controller.config-manager-facade-xml:0.4.1.Beryllium-SR1]
at org.opendaylight.controller.config.facade.xml.ConfigSubsystemFacade.executeConfigExecution(ConfigSubsystemFacade.java:123)[96:org.opendaylight.controller.config-manager-facade-xml:0.4.1.Beryllium-SR1]
at org.opendaylight.controller.config.persist.impl.ConfigPusherImpl.executeWithMissingModuleFactoryRetries(ConfigPusherImpl.java:329)[98:org.opendaylight.controller.config-persister-impl:0.4.1.Beryllium-SR1]
at org.opendaylight.controller.config.persist.impl.ConfigPusherImpl.pushConfig(ConfigPusherImpl.java:305)[98:org.opendaylight.controller.config-persister-impl:0.4.1.Beryllium-SR1]
at org.opendaylight.controller.config.persist.impl.ConfigPusherImpl.pushConfigWithConflictingVersionRetries(ConfigPusherImpl.java:164)[98:org.opendaylight.controller.config-persister-impl:0.4.1.Beryllium-SR1]
at org.opendaylight.controller.config.persist.impl.ConfigPusherImpl.internalPushConfigs(ConfigPusherImpl.java:129)[98:org.opendaylight.controller.config-persister-impl:0.4.1.Beryllium-SR1]
at org.opendaylight.controller.config.persist.impl.ConfigPusherImpl.processSingle(ConfigPusherImpl.java:83)[98:org.opendaylight.controller.config-persister-impl:0.4.1.Beryllium-SR1]
at org.opendaylight.controller.config.persist.impl.ConfigPusherImpl.process(ConfigPusherImpl.java:74)[98:org.opendaylight.controller.config-persister-impl:0.4.1.Beryllium-SR1]
at org.opendaylight.controller.config.persist.impl.osgi.ConfigPersisterActivator$2.run(ConfigPersisterActivator.java:131)[98:org.opendaylight.controller.config-persister-impl:0.4.1.Beryllium-SR1]
at java.lang.Thread.run(Unknown Source)[:1.7.0_67]

Comment by Vratko Polak [ 17/Mar/16 ]

Attachment karaf_log_snapshot_20160317.tar.xz has been added with description: Archive with trace logs on Beryllium snapshot

Comment by Vratko Polak [ 17/Mar/16 ]

As Ryan Goulding points out, symptoms are similar to CONTROLLER-623.

Comment by Vratko Polak [ 17/Mar/16 ]

Still present on today's autorelease build.
https://jenkins.opendaylight.org/releng/view/autorelease/job/aaa-csit-1node-authn-only-beryllium/1162/console

Comment by Vratko Polak [ 17/Mar/16 ]

This Bug is also present on Boron autorelease build. Not on Boron snapshot build, but last such distribution build is quite old (2016-03-13).

Comment by Vratko Polak [ 17/Mar/16 ]

I tried to reproduce on Beryllium snapshot codebase by using multipatch job [0] on Sandbox. It built odlparent, yangtools, mdsal, controller, aaa and netconf, other parts of integration/distribution were downloaded from Nexus.
Netconf userfeatures only job [1] did not show this Bug, nor memory errors.

It seems like root of this error is in autorelease.

I have compared jars in system/ between SR1 and snapshot. All non-ODL artifacts were the same, all ODL artifacts were different (except org/opendaylight/controller/thirdparty/net.sf.jung2/2.0.1/net.sf.jung2-2.0.1.jar and org/opendaylight/controller/sal/0.7.0/sal-0.7.0.jar which I guess is correct).

Perhaps the yang parser issue with not enough memory hits only on huge builds (explaining why docs build is faster when moved to front) and it can create (somehow) a broken build?

[0] https://jenkins.opendaylight.org/sandbox/job/integration-multipatch-test-beryllium/1/
[1] https://jenkins.opendaylight.org/sandbox/job/netconf-csit-1node-userfeatures-only-beryllium/1/

Comment by Thanh Ha (zxiiro) [ 17/Mar/16 ]

If I want to reproduce locally what are the steps to make this happen?

Comment by Thanh Ha (zxiiro) [ 17/Mar/16 ]

(In reply to Thanh Ha from comment #6)
> If I want to reproduce locally what are the steps to make this happen?

Figured it out, at minimum if you add to your etc/org.apache.karaf.features.cfg:

featuresBoot=config,standard,region,package,kar,ssh,management,odl-integration-compatible-with-all,odl-restconf

Should reproduce the issue.

Comment by Thanh Ha (zxiiro) [ 17/Mar/16 ]

(In reply to Vratko Polák from comment #5)
> I tried to reproduce on Beryllium snapshot codebase by using multipatch job
> [0] on Sandbox. It built odlparent, yangtools, mdsal, controller, aaa and
> netconf, other parts of integration/distribution were downloaded from Nexus.
> Netconf userfeatures only job [1] did not show this Bug, nor memory errors.

You might be on to something. I tried to do a minimal local build of autorelease building only odlparent, yangtools, mdsal, controller, aaa. I couldn't reproduce the problem either.

Bizarre.

Comment by Luis Gomez [ 18/Mar/16 ]

Thanh, you can neglect odl-integration-compatible-with-all, odl-restconf is just enough to reproduce the issue.

Comment by Thanh Ha (zxiiro) [ 18/Mar/16 ]

To test the theory that autorelease is too big I created a test job that splits the build into 3 parts. offset0, offset1, and offset2

https://jenkins.opendaylight.org/releng/view/autorelease/job/autorelease-release-test-beryllium/2/console

Comment by Vratko Polak [ 18/Mar/16 ]

> autorelease-release-test-beryllium/2/console

That run did not ignore unit test failures, so we do not have resulting .zip.
Also, perhaps offset 2 can be divided to smaller chunks?

Comment by Thanh Ha (zxiiro) [ 18/Mar/16 ]

JJB's supposed to ignore manually created jobs. I guess there's a bug somewhere because builder removed the job

I'll submit a patch to build to add the test jobs.

(In reply to Vratko Polák from comment #11)
> > autorelease-release-test-beryllium/2/console
>
> That run did not ignore unit test failures, so we do not have resulting .zip.
> Also, perhaps offset 2 can be divided to smaller chunks?

I can add another offset profile to split it more. Did you see it fail? I was asleep during the run so I wasn't able to watch it.

Comment by Vratko Polak [ 18/Mar/16 ]

Yes, it failed on a unit test failure. I am 80% sure it was the exact repeat of this one: https://jenkins.opendaylight.org/releng/view/autorelease/job/autorelease-release-beryllium/org.opendaylight.groupbasedpolicy$ofoverlay-renderer/90/testReport/junit/org.opendaylight.groupbasedpolicy.renderer.ofoverlay.arp/ArpSenderTest/floodArpTest/

Comment by Thanh Ha (zxiiro) [ 18/Mar/16 ]

Interesting my other attempt, build #96 appears to not have the test failures we saw in build #90

https://jenkins.opendaylight.org/releng/job/autorelease-release-beryllium/96/

I did 2 changes for this run.

1) Revert my 10g patch https://git.opendaylight.org/gerrit/36208
2) Removed the redundant "install" goal https://git.opendaylight.org/gerrit/36420

The build is still running though so lets see if it fails with the same issues.

Comment by A H [ 18/Mar/16 ]

This is looking very promising. So far aaa, capwap, etc have their tests passing. The fix seems to be working.

https://jenkins.opendaylight.org/releng/view/autorelease/job/integration-distribution-test-beryllium/346/

Comment by Vratko Polak [ 22/Mar/16 ]

This does not happen on recent autorelease builds anymore, marking as fixed.

Generated at Wed Feb 07 19:55:42 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.