[NETCONF-443] Netconf device configured in clustered topology may not get connected after restart Created: 27/Jul/17  Updated: 21/Mar/22

Status: Open
Project: netconf
Component/s: netconf
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Vratko Polak Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 8899

 Description   

Not sure whether this is Netconf/netconf bug or Controller/clustering bug.

The "entity" suite has detected this as a failure [0]. This (without any failure in preceding tests) happened for the first time, but previous and next [1] run have no failures.

Comparing karaf logs I do not see a clear cause, but I see the test step is executed in different stage of ODL booting up.

In no-failure run I see [2]:

2017-07-27 00:57:22,815 | INFO | e]-nio2-thread-1 | ServerSession | 30 - org.apache.sshd.core - 0.14.0 | Server session created from /10.29.4.11:59150
2017-07-27 00:57:23,053 | INFO | e]-nio2-thread-3 | ServerUserAuthService | 30 - org.apache.sshd.core - 0.14.0 | Session karaf@/10.29.4.11:59150 authenticated
...
2017-07-27 00:57:24,056 | INFO | on-dispatcher-65 | NetconfTopologyUtils | 282 - org.opendaylight.netconf.topology-singleton - 1.2.2.SNAPSHOT | RemoteDevice

{netconf-test-device} : using the default directory cache/schema
2017-07-27 00:57:24,151 | INFO | lt-dispatcher-22 | NetconfNodeActor | 282 - org.opendaylight.netconf.topology-singleton - 1.2.2.SNAPSHOT | RemoteDevice{netconf-test-device}

: Schema context resolved: [ModuleEffectiveStatementImpl[name=ietf-inet-types, namespace=urn:ietf:params:xml:ns:yang:ietf-inet-types, revision=2010-09-24, prefix=inet, yangVersion=1], ModuleEffectiveStatementImpl[name=yang-ext, namespace=urn:opendaylight:yang:extension:yang-ext, revision=2013-07-09, prefix=ext, yangVersion=1], ModuleEffectiveStatementImpl[name=people, namespace=urn:opendaylight:params:xml:ns:yang:controller:config:sal-clustering-it:people, revision=2014-08-18, prefix=people, yangVersion=1], ModuleEffectiveStatementImpl[name=test, namespace=urn:opendaylight:test:netconf:crud, revision=2014-10-18, prefix=tt, yangVersion=1], ModuleEffectiveStatementImpl[name=ietf-yang-types, namespace=urn:ietf:params:xml:ns:yang:ietf-yang-types, revision=2013-07-15, prefix=yang, yangVersion=1], ModuleEffectiveStatementImpl[name=car, namespace=urn:opendaylight:params:xml:ns:yang:controller:config:sal-clustering-it:car, revision=2014-08-18, prefix=car, yangVersion=1], ModuleEffectiveStatementImpl[name=car-people, namespace=urn:opendaylight:params:xml:ns:yang:controller:config:sal-clustering-it:car-people, revision=2014-08-18, prefix=car, yangVersion=1], ModuleEffectiveStatementImpl[name=car-purchase, namespace=urn:opendaylight:params:xml:ns:yang:controller:config:sal-clustering-it:car-purchase, revision=2014-08-18, prefix=cp, yangVersion=1], ModuleEffectiveStatementImpl[name=ncmount-l2fib, namespace=org:opendaylight:coretutorials:ncmount:example:l2fib, revision=2016-03-07, prefix=l2fib, yangVersion=1], ModuleEffectiveStatementImpl[name=ietf-inet-types, namespace=urn:ietf:params:xml:ns:yang:ietf-inet-types, revision=2013-07-15, prefix=inet, yangVersion=1], ModuleEffectiveStatementImpl[name=ietf-netconf-monitoring, namespace=urn:ietf:params:xml:ns:yang:ietf-netconf-monitoring, revision=2010-10-04, prefix=ncm, yangVersion=1], ModuleEffectiveStatementImpl[name=ietf-netconf-monitoring-extension, namespace=urn:ietf:params:xml:ns:yang:ietf-netconf-monitoring-extension, revision=2013-12-10, prefix=ncme, yangVersion=1]]
2017-07-27 00:57:24,155 | INFO | lt-dispatcher-22 | SlaveSalFacade | 282 - org.opendaylight.netconf.topology-singleton - 1.2.2.SNAPSHOT | RemoteDevice

{netconf-test-device}: Slave mount point registered.
2017-07-27 00:57:24,941 | INFO | h for user karaf | command | 291 - org.apache.karaf.log.command - 3.0.8 | ROBOT MESSAGE: Starting test Modify_Device_Data_With_Original_Owner

In the failed run [3]:

2017-07-26 07:07:15,552 | INFO | on-dispatcher-59 | NetconfTopologyUtils | 282 - org.opendaylight.netconf.topology-singleton - 1.2.2.Carbon | RemoteDevice{netconf-test-device}

: using the default directory cache/schema
2017-07-26 07:07:17,208 | INFO | 7]-nio2-thread-1 | ServerSession | 30 - org.apache.sshd.core - 0.14.0 | Server session created from /10.29.4.11:59212
2017-07-26 07:07:17,350 | INFO | 7]-nio2-thread-4 | ServerUserAuthService | 30 - org.apache.sshd.core - 0.14.0 | Session karaf@/10.29.4.11:59212 authenticated
2017-07-26 07:07:17,546 | INFO | h for user karaf | command | 291 - org.apache.karaf.log.command - 3.0.8 | ROBOT MESSAGE: Starting test Check_Config_Data_After_Original_Owner_Restart
(and no Schema context resolved line)

I do not think Robot connecting to karaf ssh port and logging the start of test step can make a difference.

I will try to reproduce on Sandbox with more verbose log.

[0] https://logs.opendaylight.org/releng/jenkins092/netconf-csit-3node-clustering-only-carbon/607/log.html.gz#s1-s5-t14-k2-k1-k2-k1-k4-k1
[1] https://logs.opendaylight.org/releng/jenkins092/netconf-csit-3node-clustering-only-carbon/608/log.html.gz#s1-s5-t14-k2-k2-k2-k1-k4-k1
[2] https://logs.opendaylight.org/releng/jenkins092/netconf-csit-3node-clustering-only-carbon/608/odl1_karaf.log.gz
[3] https://logs.opendaylight.org/releng/jenkins092/netconf-csit-3node-clustering-only-carbon/607/odl1_karaf.log.gz



 Comments   
Comment by Vratko Polak [ 14/Aug/17 ]

> reproduce on Sandbox

That was not successful, and also Releng runs have not repeated this symptom since then.

Comment by Robert Varga [ 14/Aug/17 ]

Closing as we cannot reproduce it at this time. Feel free to reopen with new data.

Comment by Vratko Polak [ 17/Oct/17 ]

A similar failure happened in one Carbon CSIT run, but karaf.log is somewhat different.

First, a successful run shows [4]:
2017-10-10 08:36:00,798 | INFO | Event Dispatcher | BlueprintContainerImpl | 15 - org.apache.aries.blueprint.core - 1.6.1 | Bundle org.opendaylight.netconf.topology-singleton/1.2.2.Carbon is waiting for dependencies [Initial app config Config]
...
2017-10-10 08:36:00,931 | INFO | tAdminThread #18 | BlueprintBundleTracker | 149 - org.opendaylight.controller.blueprint - 0.6.2.Carbon | Blueprint container for bundle org.opendaylight.netconf.topology-singleton_1.2.2.Carbon [494] was successfully created
2017-10-10 08:36:00,953 | INFO | on-dispatcher-58 | NetconfTopologyUtils | 494 - org.opendaylight.netconf.topology-singleton - 1.2.2.Carbon | RemoteDevice

{netconf-test-device} : using the default directory cache/schema
2017-10-10 08:36:01,053 | INFO | lt-dispatcher-20 | NetconfNodeActor | 494 - org.opendaylight.netconf.topology-singleton - 1.2.2.Carbon | RemoteDevice{netconf-test-device}

: Schema context resolved:
...
2017-10-10 08:36:01,057 | INFO | lt-dispatcher-20 | SlaveSalFacade | 494 - org.opendaylight.netconf.topology-singleton - 1.2.2.Carbon | RemoteDevice

{netconf-test-device}: Slave mount point registered.

Now, unsuccessful run [5]:
2017-10-16 06:39:05,849 | INFO | Event Dispatcher | BlueprintContainerImpl | 15 - org.apache.aries.blueprint.core - 1.6.1 | Bundle org.opendaylight.netconf.topology-singleton/1.2.2.SNAPSHOT is waiting for dependencies [Initial app config Config]
...
2017-10-16 06:39:06,147 | INFO | on-dispatcher-78 | NetconfTopologyUtils | 494 - org.opendaylight.netconf.topology-singleton - 1.2.2.SNAPSHOT | RemoteDevice{netconf-test-device}

: using the default directory cache/schema
2017-10-16 06:39:06,154 | INFO | tAdminThread #18 | BlueprintBundleTracker | 149 - org.opendaylight.controller.blueprint - 0.6.2.SNAPSHOT | Blueprint container for bundle org.opendaylight.netconf.topology-singleton_1.2.2.SNAPSHOT [494] was successfully created

... and no more logs related to netconf-test-device or topology-singleton, until removal attempt:
2017-10-16 06:45:02,609 | WARN | on-dispatcher-53 | NetconfTopologyManager | 494 - org.opendaylight.netconf.topology-singleton - 1.2.2.SNAPSHOT | Error at closing topology context. InstanceIdentifier: KeyedInstanceIdentifier

{targetType=interface org.opendaylight.yang.gen.v1.urn.tbd.params.xml.ns.yang.network.topology.rev131021.network.topology.topology.Node, path=[org.opendaylight.yang.gen.v1.urn.tbd.params.xml.ns.yang.network.topology.rev131021.NetworkTopology, org.opendaylight.yang.gen.v1.urn.tbd.params.xml.ns.yang.network.topology.rev131021.network.topology.Topology[key=TopologyKey [_topologyId=Uri [_value=topology-netconf]]], org.opendaylight.yang.gen.v1.urn.tbd.params.xml.ns.yang.network.topology.rev131021.network.topology.topology.Node[key=NodeKey [_nodeId=Uri [_value=netconf-test-device]]]]}

[4] https://logs.opendaylight.org/releng/jenkins092/netconf-csit-3node-clustering-all-carbon/422/odl2_karaf.log.gz
[5] https://logs.opendaylight.org/releng/jenkins092/netconf-csit-3node-clustering-all-carbon/423/odl2_karaf.log.gz

Comment by Vratko Polak [ 24/Oct/17 ]

Edit: Oops, I have not realized Carbon job had only one run in a week, and it failed due to infra, so this is already reported.

One more occurrence [6] (member-2 does not mount netconf-test-device after restart).
Once again, there is no netconf progress in karaf log [7] after:

2017-10-16 06:39:06,154 | INFO | tAdminThread #18 | BlueprintBundleTracker | 149 - org.opendaylight.controller.blueprint - 0.6.2.SNAPSHOT | Blueprint container for bundle org.opendaylight.netconf.topology-singleton_1.2.2.SNAPSHOT [494] was successfully created

[6] https://logs.opendaylight.org/releng/jenkins092/netconf-csit-3node-clustering-all-carbon/423/log.html.gz#s1-s12-t15-k2-k1-k2-k1-k4-k1
[7] https://logs.opendaylight.org/releng/jenkins092/netconf-csit-3node-clustering-all-carbon/423/odl2_karaf.log.gz

Generated at Wed Feb 07 20:15:03 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.