[NETCONF-97] netconf won't come up with 'odl-integration-compatible-with-all' Created: 05/Nov/15  Updated: 15/Mar/19  Resolved: 04/Jan/16

Status: Resolved
Project: netconf
Component/s: netconf
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Jozef Behran Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issue Links:
Blocks
is blocked by SNMP4SDN-9 config.yang file conflict between snm... Resolved
External issue ID: 4583

 Description   

Put 'config,standard,region,package,kar,ssh,management,odl-integration-compatible-with-all,odl-restconf,odl-netconf-mdsal,odl-netconf-connector-all' into bootFeatures, then fire ODL up. Netconf will fail to come up.

When debug logging was enabled around the time Netconf was expected to be up, the following fragment was spotted in the logs repeating over and over again:

...
2015-11-05 13:22:50,644 | DEBUG | r - ConfigPusher | bind | - - | Trying to create the platform default provider
2015-11-05 13:22:50,644 | DEBUG | r - ConfigPusher | bind | - - | Trying to load com.sun.xml.internal.bind.v2.ContextFactory
2015-11-05 13:22:50,644 | DEBUG | r - ConfigPusher | bind | - - | loaded com.sun.xml.internal.bind.v2.ContextFactory from jar:file:/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.75.x86_64/jre/lib/rt.jar!/com/sun/xml/internal/bind/v2/ContextFactory.class
2015-11-05 13:22:50,644 | DEBUG | r - ConfigPusher | ContextFactory | - - | Property com.sun.xml.internal.bind.XmlAccessorFactoryis not active. Using JAXB's implementation
2015-11-05 13:22:50,645 | DEBUG | r - ConfigPusher | OptimizedAccessorFactory | - - | Using optimized Accessor for public java.util.SortedSet org.opendaylight.controller.config.persist.storage.file.xml.model.ConfigSnapshot.getCapabilities() and public void org.opendaylight.controller.config.persist.storage.file.xml.model.ConfigSnapshot.setCapabilities(java.util.SortedSet)
2015-11-05 13:22:50,646 | DEBUG | r - ConfigPusher | OptimizedAccessorFactory | - - | Using optimized Accessor for public java.lang.String org.opendaylight.controller.config.persist.storage.file.xml.model.ConfigSnapshot.getConfigSnapshot() and public void org.opendaylight.controller.config.persist.storage.file.xml.model.ConfigSnapshot.setConfigSnapshot(java.lang.String)
2015-11-05 13:22:50,646 | DEBUG | r - ConfigPusher | OptimizedAccessorFactory | - - | Using optimized Accessor for public java.util.Set org.opendaylight.controller.config.persist.storage.file.xml.model.ConfigSnapshot.getFeatures() and public void org.opendaylight.controller.config.persist.storage.file.xml.model.ConfigSnapshot.setFeatures(java.util.Set)
2015-11-05 13:22:50,648 | DEBUG | r - ConfigPusher | bind | - - | Trying to locate org/opendaylight/controller/config/persist/storage/file/xml/model/jaxb.properties
2015-11-05 13:22:50,648 | DEBUG | r - ConfigPusher | bind | - - | not found
2015-11-05 13:22:50,648 | DEBUG | r - ConfigPusher | bind | - - | Checking system property javax.xml.bind.JAXBContext
2015-11-05 13:22:50,648 | DEBUG | r - ConfigPusher | bind | - - | not found
2015-11-05 13:22:50,648 | DEBUG | r - ConfigPusher | bind | - - | Checking META-INF/services
2015-11-05 13:22:50,648 | DEBUG | r - ConfigPusher | bind | - - | Unable to find: META-INF/services/javax.xml.bind.JAXBContext
2015-11-05 13:22:50,648 | DEBUG | r - ConfigPusher | bind | - - | Trying to create the platform default provider
...

(the first "Trying to create the platform default provider" at the start is from the previous iteration of the infinite loop, the last one belongs to the current iteration and is followed by another instance of the second log message which is followed by another instance of the third log message and so on and so forth).
(messages from other features/subsystems are randomly interspersed).

From my perspective this is what the logs are telling me about what is happening:

1. "/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.75.x86_64/jre/lib/rt.jar!/com/sun/xml/internal/bind/v2/ContextFactory.class" tries to access "com.sun.xml.internal.bind.XmlAccessorFactoryis"
2. Because "com.sun.xml.internal.bind.XmlAccessorFactoryis" is "not active", the code tries to load and use "JAXB's implementation"
3. The "Using JAXB's implementation" tries to access "jaxb.properties".
4. The "jaxb.properties" is discovered to be nowhere to be found.
5. Because of that, "META-INF/services/javax.xml.bind.JAXBContext" cannot be found and the code thus tries to create "the default platform provider"
6. The default platform provider is "/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.75.x86_64/jre/lib/rt.jar!/com/sun/xml/internal/bind/v2/ContextFactory.class" and because of that the loop repeats.



 Comments   
Comment by Jozef Behran [ 05/Nov/15 ]

Occurs in stable-lithium branch:

https://jenkins.opendaylight.org/releng/view/netconf/job/netconf-csit-1node-userfeatures-all-stable-lithium/

Comment by Luis Gomez [ 05/Nov/15 ]

Right we talk about this issue every Li SR reviews, thanks for opening the bug.

Comment by Vratko Polak [ 09/Nov/15 ]

Looking at karaf.log [0], there is another symptom, visible due to workaround for CONTROLLER-1200. This message keeps repeating:

2015-11-05 19:19:46,735 | WARN | r - ConfigPusher | FeatureConfigPusher | 144 - config-persister-feature-adapter - 0.3.3.SNAPSHOT | Feature: standard-condition-webconsole_0_0_0, 3.0.3 is missing from features service. Skipping

Perhaps ConfigPushingRunnable does not detect there is nothing more to push; or (more probably) there is still something to push, but there is not enough logging to see what it is and why it does not get pushed quickly enough.

[0] https://jenkins.opendaylight.org/releng/view/netconf/job/netconf-csit-1node-userfeatures-all-stable-lithium/lastSuccessfulBuild/artifact/karaf.log

Comment by Vratko Polak [ 12/Nov/15 ]

Manual testing shows that if all odl-sfc* features are removed from odl-integration-compatible-with-all, this Bug does not happen.
More examination in progress.

Comment by Vratko Polak [ 12/Nov/15 ]

It is the
<configfile finalname="etc/opendaylight/karaf/sfc-netconf-initial.xml">
of odl-sfc-netconf which triggers the behavior.

But there were no changes in sfc or main netconf recently, the change which introduced failures was in fact https://git.opendaylight.org/gerrit/28985

So this seem to be a bug in config persister implementation.

Comment by Vratko Polak [ 12/Nov/15 ]

> the change which introduced failures was in fact https://git.opendaylight.org/gerrit/28985

Not true. I looked at wrong job, sorry.
This Lithium Bug occurred way earlier than that.

Comment by Vratko Polak [ 12/Nov/15 ]

> etc/opendaylight/karaf/sfc-netconf-initial.xml

And also this is not a real signal, just a coincidence due to my uneven patience.
Both with and without this file ODL stabilizes at around the same time (12 and half minutes since started, in my environment).
I still have to verify controller-netconf is mounted after that time.

If this really turns out to be merely a bootup performance bug, perhaps this e-mail would be relevant for the fix: https://lists.opendaylight.org/pipermail/controller-dev/2015-September/010305.html

Comment by Vratko Polak [ 12/Nov/15 ]

Turns out my first binary search was focused on the wrong symptom. CPU stability always eventually happen, the critical symptom is whether there are data mounted on controller-config afterwards.

Comment by Vratko Polak [ 13/Nov/15 ]

My examination found a single culprit:
<bundle>mvn:org.opendaylight.snmp4sdn/md-model/0.2.3-SNAPSHOT</bundle>
in other words this Bug is a consequence of SNMP4SDN-9.

Workaround to unblock testing without actually fixing SNMP4SDN-9 is to move snmp4sdn from odl-integration-compatble-with-all:
https://git.opendaylight.org/gerrit/29671 (integration/distribution Lithium)

Comment by Christine Hsieh [ 04/Jan/16 ]

SNMP4SDN-9 was just fixed, thanks.

Generated at Wed Feb 07 20:14:10 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.