[CONTROLLER-1650] sal-binding-broker-impl times out waiting for ClassLoadingStrategy Created: 01/May/17  Updated: 25/Jul/23  Resolved: 04/May/17

Status: Resolved
Project: controller
Component/s: blueprint
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Colin Dixon Assignee: Robert Varga
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: File org.opendaylight.odlparent.featuretest.SingleFeatureTest-output.txt.gz    
Issue Links:
Blocks
blocks RELENG-47 get autorelease-carbon working consis... Resolved
External issue ID: 8342

 Description   

I'm seeing periodic SingleFeatureTest hangs in autorelease that seem to all trace back to this root issue:

2017-05-01 10:44:13,976 | ERROR | rint Extender: 1 | BlueprintContainerImpl | 12 - org.apache.aries.blueprint.core - 1.7.1 | Unable to start blueprint container for bundle org.opendaylight.controller.sal-binding-broker-impl/1.5.0.Carbon due to unresolved dependencies [(objectClass=org.opendaylight.mdsal.binding.generator.api.ClassLoadingStrategy)]
java.util.concurrent.TimeoutException
at org.apache.aries.blueprint.container.BlueprintContainerImpl$1.run(BlueprintContainerImpl.java:371)[12:org.apache.aries.blueprint.core:1.7.1]

I'm not sure what is causing this, but between this and BUG-8335, they amount for the vast majority of autorelease failures at the moment.



 Comments   
Comment by Colin Dixon [ 01/May/17 ]

Attachment org.opendaylight.odlparent.featuretest.SingleFeatureTest-output.txt.gz has been added with description: this is an example of the failure, takeng from failnever job #3

Comment by Colin Dixon [ 01/May/17 ]

Jobs that failed with SFT hangs:
https://jenkins.opendaylight.org/releng/view/autorelease/job/autorelease-release-carbon/271/
https://jenkins.opendaylight.org/releng/view/autorelease/job/autorelease-release-failnever-carbon/1/
https://jenkins.opendaylight.org/releng/view/autorelease/job/autorelease-release-failnever-carbon/3/
https://jenkins.opendaylight.org/releng/view/autorelease/job/autorelease-release-failnever-carbon/4/

Comment by Colin Dixon [ 02/May/17 ]

Robert Varga thinks this is the issue:

2017-05-01 06:14:00,894 | WARN | pool-2-thread-1 | AugmentStatementImpl | 370 - org.opendaylight.yangtools.yang-parser-impl - 1.1.0.Carbon | Failed to add augmentation alto-model-config:63:12 defined at alto-manual-maps:80:4
org.opendaylight.yangtools.yang.parser.spi.meta.InferenceException: An augment cannot add node 'resource-id' because it is mandatory and in module different than target [at alto-networkmap-config:49:8]
at org.opendaylight.yangtools.yang.parser.spi.meta.InferenceException.throwIf(InferenceException.java:47)
at org.opendaylight.yangtools.yang.parser.stmt.rfc6020.AugmentStatementImpl$Definition.checkForMandatoryNodes(AugmentStatementImpl.java:301)
at java.util.ArrayList.forEach(ArrayList.java:1249)[:1.8.0_121]
at java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080)[:1.8.0_121]
at org.opendaylight.yangtools.yang.parser.stmt.rfc6020.AugmentStatementImpl$Definition.checkForMandatoryNodes(AugmentStatementImpl.java:298)
at org.opendaylight.yangtools.yang

It appears when the feature service is starting config-manager in the recent alto failure. The result is that config-manager is never started and thus the ClassLoadingStrategy is never exported.

This doesn't happen in the NEMO case, but in the NEMO case it seems like for whatever reason there's a deadlock if the ClassLoadingStrategy isn't exported.

Comment by Robert Varga [ 02/May/17 ]

The problems seem to be related to the fact config-manager either not being started (in the nemo case) or it seems to throw and exception (alto case).

In either case it seems config-manager's activator is not completing.

Comment by Colin Dixon [ 02/May/17 ]

It sounds like there might be a way to change the run level of the config-manager to make sure it loads first or early.

Comment by Colin Dixon [ 02/May/17 ]

Vratko pointed out that the last failures in carbon-merge:
https://jenkins.opendaylight.org/releng/view/Merge-Carbon/

Is a reasonable place to look for if this occurs outside of autorelease.

Comment by Colin Dixon [ 02/May/17 ]

For what it's worth, from the lastFailedBuilds, 3 of the hung in SFT:
https://jenkins.opendaylight.org/releng/view/Merge-Carbon/job/distribution-merge-carbon/1717/
https://jenkins.opendaylight.org/releng/view/Merge-Carbon/job/netvirt-merge-carbon/737/
https://jenkins.opendaylight.org/releng/view/Merge-Carbon/job/netconf-merge-carbon/257/

At least the distribution issue appears to be an occurrence of the same thing, so it's not just an autorelease issue:
odl-integration-all in distribution merge carbon 1717:
https://logs.opendaylight.org/releng/jenkins092/distribution-merge-carbon/1717/archives/features/singles/odl-integration-all/target/surefire-reports/org.opendaylight.odlparent.featuretest.SingleFeatureTest-output.txt.gz

2017-04-30 11:42:25,477 | ERROR | rint Extender: 3 | BlueprintContainerImpl | 12 - org.apache.aries.blueprint.core - 1.7.1 | Unable to start blueprint container for bundle org.opendaylight.controller.sal-binding-broker-impl/1.5.0.SNAPSHOT due to unresolved dependencies [(objectClass=org.opendaylight.mdsal.binding.generator.api.ClassLoadingStrategy)]

The netvirt case doesn't seem have surefire logs for the odl-netvirt-openstack feature.

The netconf case doesn't end up hitting exactly the same issue, but it's similar:
https://logs.opendaylight.org/releng/jenkins092/netconf-merge-carbon/257/archives/features/netconf-connector/odl-message-bus/target/surefire-reports/org.opendaylight.odlparent.featuretest.SingleFeatureTest-output.txt.gz

2017-04-28 14:11:47,635 | ERROR | rint Extender: 2 | BlueprintContainerImpl | 12 - org.apache.aries.blueprint.core - 1.7.1 | Unable to start blueprint container for bundle org.opendaylight.netconf.messagebus-netconf/1.2.0.SNAPSHOT due to unresolved dependencies [(objectClass=org.opendaylight.controller.messagebus.spi.EventSourceRegistry)]

Comment by Tom Pantelis [ 02/May/17 ]

We should add logging in ConfigManagerActivator to see if it even runs. I can push a patch for that. It already catches and logs Exception (WARN) and Error (ERROR) so we should know if it failed.

If it didn't even run, that indicates either bundle didn't resolve (unlikely) or some other bundle activator is stuck preventing ConfigManagerActivator from running. For the latter we'd need a thread dump.

Comment by Tom Pantelis [ 02/May/17 ]

https://git.opendaylight.org/gerrit/#/c/56406/ to add info logging.

Comment by Robert Varga [ 03/May/17 ]

carbon: https://git.opendaylight.org/gerrit/56437

Generated at Wed Feb 07 19:56:05 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.