[NETCONF-397] Boron netconf cluster -all- job failures Created: 10/Apr/17  Updated: 15/Mar/19  Resolved: 07/May/17

Status: Verified
Project: netconf
Component/s: netconf
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Vratko Polak Assignee: Andrej Mak
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Zip Archive CRUD-debug-logs.zip    
External issue ID: 8197

 Description   

The only job passes, and [0] the symptoms are different from CONTROLLER-1613.
Probably some other project is interfering.

Connection to testtool is closed, this is seen in karaf.log:
2017-04-10 02:07:08,435 | WARN | monPool-worker-0 | NetconfDeviceCommunicator | 284 - org.opendaylight.netconf.sal-netconf-connector - 1.4.4.SNAPSHOT | RemoteDevice

{test-device}

: Session is disconnected, failing RPC request <rpc message-id="m-2874" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<get-schema xmlns="urn:ietf:params:xml:ns:yang:ietf-netconf-monitoring">
<identifier>ietf-yang-types</identifier>
<version>2013-07-15</version>
<format>yang</format>
</get-schema>
</rpc>

The request itself look OK, investigation is under way.

[0] https://jenkins.opendaylight.org/releng/view/netconf/job/netconf-csit-3node-clustering-all-boron



 Comments   
Comment by Vratko Polak [ 11/Apr/17 ]

This Bug does not happen if USC and CAPWAP features are removed from odl-integration-compatible-with-all. More details to follow.

Comment by Vratko Polak [ 11/Apr/17 ]

It is sufficient to remove CAPWAP. https://jenkins.opendaylight.org/sandbox/job/integration-multipatch-test-boron/6/

Comment by Andrej Mak [ 13/Apr/17 ]

It seems, that for some reason, slave mount points are registered about 30 s later than master mount point. Test checks "netconf-node-topology:connection-status" == "connected" condition. This attribute is set by master, when connection to device is established and master node mount point registered. However, slave nodes begin schema resolution and mount point registration only after setting "connected" by master.

https://logs.opendaylight.org/releng/jenkins092/netconf-csit-3node-clustering-all-boron/477/archives/

Node 1 - master:
2017-04-13 01:15:32,374 | INFO | ssing-executor-8 | MasterSalFacade | 509 - org.opendaylight.netconf.topology-singleton - 1.1.4.SNAPSHOT | RemoteDevice

{netconf-test-device}: Creating master data broker for device
2017-04-13 01:15:32,375 | INFO | ssing-executor-8 | NetconfDevice | 284 - org.opendaylight.netconf.sal-netconf-connector - 1.4.4.SNAPSHOT | RemoteDevice{netconf-test-device}

: Netconf connector initialized successfully

Node 2 slave:
2017-04-13 01:16:03,501 | INFO | lt-dispatcher-18 | NetconfNodeActor | 509 - org.opendaylight.netconf.topology-singleton - 1.1.4.SNAPSHOT | RemoteDevice

{netconf-test-device}: Schema context resolved: [ModuleEffectiveStatementImpl[name=ietf-inet-types, namespace=urn:ietf:params:xml:ns:yang:ietf-inet-types, revision=Fri Sep 24 00:00:00 UTC 2010, prefix=inet, yangVersion=1], ModuleEffectiveStatementImpl[name=yang-ext, namespace=urn:opendaylight:yang:extension:yang-ext, revision=Tue Jul 09 00:00:00 UTC 2013, prefix=ext, yangVersion=1], ModuleEffectiveStatementImpl[name=people, namespace=urn:opendaylight:params:xml:ns:yang:controller:config:sal-clustering-it:people, revision=Mon Aug 18 00:00:00 UTC 2014, prefix=people, yangVersion=1], ModuleEffectiveStatementImpl[name=test, namespace=urn:opendaylight:test:netconf:crud, revision=Sat Oct 18 00:00:00 UTC 2014, prefix=tt, yangVersion=1], ModuleEffectiveStatementImpl[name=ietf-yang-types, namespace=urn:ietf:params:xml:ns:yang:ietf-yang-types, revision=Mon Jul 15 00:00:00 UTC 2013, prefix=yang, yangVersion=1], ModuleEffectiveStatementImpl[name=car, namespace=urn:opendaylight:params:xml:ns:yang:controller:config:sal-clustering-it:car, revision=Mon Aug 18 00:00:00 UTC 2014, prefix=car, yangVersion=1], ModuleEffectiveStatementImpl[name=car-people, namespace=urn:opendaylight:params:xml:ns:yang:controller:config:sal-clustering-it:car-people, revision=Mon Aug 18 00:00:00 UTC 2014, prefix=car, yangVersion=1], ModuleEffectiveStatementImpl[name=car-purchase, namespace=urn:opendaylight:params:xml:ns:yang:controller:config:sal-clustering-it:car-purchase, revision=Mon Aug 18 00:00:00 UTC 2014, prefix=cp, yangVersion=1], ModuleEffectiveStatementImpl[name=ncmount-l2fib, namespace=org:opendaylight:coretutorials:ncmount:example:l2fib, revision=Mon Mar 07 00:00:00 UTC 2016, prefix=l2fib, yangVersion=1], ModuleEffectiveStatementImpl[name=ietf-inet-types, namespace=urn:ietf:params:xml:ns:yang:ietf-inet-types, revision=Mon Jul 15 00:00:00 UTC 2013, prefix=inet, yangVersion=1], ModuleEffectiveStatementImpl[name=ietf-netconf-monitoring, namespace=urn:ietf:params:xml:ns:yang:ietf-netconf-monitoring, revision=Mon Oct 04 00:00:00 UTC 2010, prefix=ncm, yangVersion=1], ModuleEffectiveStatementImpl[name=ietf-netconf-monitoring-extension, namespace=urn:ietf:params:xml:ns:yang:ietf-netconf-monitoring-extension, revision=Tue Dec 10 00:00:00 UTC 2013, prefix=ncme, yangVersion=1]]
2017-04-13 01:16:03,502 | INFO | lt-dispatcher-18 | SlaveSalFacade | 509 - org.opendaylight.netconf.topology-singleton - 1.1.4.SNAPSHOT | RemoteDevice{netconf-test-device}

: Slave mount point registered.

Node 3 slave:
2017-04-13 01:16:03,504 | INFO | lt-dispatcher-18 | NetconfNodeActor | 509 - org.opendaylight.netconf.topology-singleton - 1.1.4.SNAPSHOT | RemoteDevice

{netconf-test-device}: Schema context resolved: [ModuleEffectiveStatementImpl[name=ietf-inet-types, namespace=urn:ietf:params:xml:ns:yang:ietf-inet-types, revision=Fri Sep 24 00:00:00 UTC 2010, prefix=inet, yangVersion=1], ModuleEffectiveStatementImpl[name=yang-ext, namespace=urn:opendaylight:yang:extension:yang-ext, revision=Tue Jul 09 00:00:00 UTC 2013, prefix=ext, yangVersion=1], ModuleEffectiveStatementImpl[name=people, namespace=urn:opendaylight:params:xml:ns:yang:controller:config:sal-clustering-it:people, revision=Mon Aug 18 00:00:00 UTC 2014, prefix=people, yangVersion=1], ModuleEffectiveStatementImpl[name=test, namespace=urn:opendaylight:test:netconf:crud, revision=Sat Oct 18 00:00:00 UTC 2014, prefix=tt, yangVersion=1], ModuleEffectiveStatementImpl[name=ietf-yang-types, namespace=urn:ietf:params:xml:ns:yang:ietf-yang-types, revision=Mon Jul 15 00:00:00 UTC 2013, prefix=yang, yangVersion=1], ModuleEffectiveStatementImpl[name=car, namespace=urn:opendaylight:params:xml:ns:yang:controller:config:sal-clustering-it:car, revision=Mon Aug 18 00:00:00 UTC 2014, prefix=car, yangVersion=1], ModuleEffectiveStatementImpl[name=car-people, namespace=urn:opendaylight:params:xml:ns:yang:controller:config:sal-clustering-it:car-people, revision=Mon Aug 18 00:00:00 UTC 2014, prefix=car, yangVersion=1], ModuleEffectiveStatementImpl[name=car-purchase, namespace=urn:opendaylight:params:xml:ns:yang:controller:config:sal-clustering-it:car-purchase, revision=Mon Aug 18 00:00:00 UTC 2014, prefix=cp, yangVersion=1], ModuleEffectiveStatementImpl[name=ncmount-l2fib, namespace=org:opendaylight:coretutorials:ncmount:example:l2fib, revision=Mon Mar 07 00:00:00 UTC 2016, prefix=l2fib, yangVersion=1], ModuleEffectiveStatementImpl[name=ietf-inet-types, namespace=urn:ietf:params:xml:ns:yang:ietf-inet-types, revision=Mon Jul 15 00:00:00 UTC 2013, prefix=inet, yangVersion=1], ModuleEffectiveStatementImpl[name=ietf-netconf-monitoring, namespace=urn:ietf:params:xml:ns:yang:ietf-netconf-monitoring, revision=Mon Oct 04 00:00:00 UTC 2010, prefix=ncm, yangVersion=1], ModuleEffectiveStatementImpl[name=ietf-netconf-monitoring-extension, namespace=urn:ietf:params:xml:ns:yang:ietf-netconf-monitoring-extension, revision=Tue Dec 10 00:00:00 UTC 2013, prefix=ncme, yangVersion=1]]
2017-04-13 01:16:03,505 | INFO | lt-dispatcher-18 | SlaveSalFacade | 509 - org.opendaylight.netconf.topology-singleton - 1.1.4.SNAPSHOT | RemoteDevice{netconf-test-device}

: Slave mount point registered.

Comment by Vratko Polak [ 19/Apr/17 ]

CAPWAP features are no longer in odl-integration-compatible-with-all, but the job [0] still contains features.

Comment by Vratko Polak [ 19/Apr/17 ]

> slave mount points are registered about 30 s later than master mount point.

Confirmed.
This happens in current Boron snapshot builds (without Capwap, with or without USC).

The interesting thing is this only happens in "CRUD" suite, which is executed second. It does not happen in (basically the same) Bug8086 suite (executed third), and it does not happen in somewhat similar Netconfready suite (executed first).

Still no idea why slaves are late just in the second suite of all job.

The only suspicious thing visible in karaf.log at INFO level is the following (happens on device master):

2017-04-17 16:26:25,348 | INFO | ssing-executor-3 | NetconfDevice | 262 - org.opendaylight.netconf.sal-netconf-connector - 1.4.4.Boron-SR4 | RemoteDevice

{netconf-test-device}

: Netconf connector initialized successfully
2017-04-17 16:26:25,454 | INFO | on-dispatcher-62 | SfcProviderSffEntryDataListener | 452 - org.opendaylight.sfc.provider - 0.3.4.Boron-SR4 | org.opendaylight.sfc.provider.SfcProviderSffEntryDataListener.onDataChanged(SfcProviderSffEntryDataListener.java:94): Create SFF [SffName [_value=netconf-test-device]]
2017-04-17 16:26:25,457 | ERROR | on-dispatcher-66 | DataChangeListener | 191 - org.opendaylight.controller.sal-distributed-datastore - 1.4.4.Boron-SR4 | Error notifying listener org.opendaylight.sfc.sfc_lisp.provider.listener.SfcLispSffDataListener
java.lang.NullPointerException
at org.opendaylight.sfc.sfc_lisp.provider.LispUpdater.containsLispAddress(LispUpdater.java:125)[459:org.opendaylight.sfc.lisp:0.3.4.Boron-SR4]
at org.opendaylight.sfc.sfc_lisp.provider.listener.SfcLispSffDataListener.onDataChanged(SfcLispSffDataListener.java:64)[459:org.opendaylight.sfc.lisp:0.3.4.Boron-SR4]
at org.opendaylight.controller.md.sal.binding.impl.AbstractForwardedDataBroker$TranslatingDataChangeInvoker.onDataChanged(AbstractForwardedDataBroker.java:143)[161:org.opendaylight.controller.sal-binding-broker-impl:1.4.4.Boron-SR4]
at org.opendaylight.controller.cluster.datastore.DataChangeListener.dataChanged(DataChangeListener.java:73)[191:org.opendaylight.controller.sal-distributed-datastore:1.4.4.Boron-SR4]
at org.opendaylight.controller.cluster.datastore.DataChangeListener.handleReceive(DataChangeListener.java:45)[191:org.opendaylight.controller.sal-distributed-datastore:1.4.4.Boron-SR4]
at org.opendaylight.controller.cluster.common.actor.AbstractUntypedActor.onReceive(AbstractUntypedActor.java:26)[185:org.opendaylight.controller.sal-clustering-commons:1.4.4.Boron-SR4]
at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)[173:com.typesafe.akka.actor:2.4.7]
at akka.actor.Actor$class.aroundReceive(Actor.scala:484)[173:com.typesafe.akka.actor:2.4.7]
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)[173:com.typesafe.akka.actor:2.4.7]
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)[173:com.typesafe.akka.actor:2.4.7]
at akka.actor.ActorCell.invoke(ActorCell.scala:495)[173:com.typesafe.akka.actor:2.4.7]
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)[173:com.typesafe.akka.actor:2.4.7]
at akka.dispatch.Mailbox.run(Mailbox.scala:224)[173:com.typesafe.akka.actor:2.4.7]
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)[173:com.typesafe.akka.actor:2.4.7]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
2017-04-17 16:26:25,464 | INFO | on-dispatcher-68 | SfcSffToOvsMappingAPI | 454 - org.opendaylight.sfc.ovs - 0.3.4.Boron-SR4 | Cannot build OvsdbBridgeAugmentation. Missing OVS Bridge augmentation on SFF SffName [_value=netconf-test-device]

Comment by Andrej Mak [ 24/Apr/17 ]

From sandbox run with enhanced logging it seems, that those 30 s on slave node are spent to download yang sources from master during schema resolution. I will add additional logging to find exact cause of this delay. I've attached logs from CRUD suite with enhanced logging.

Gerrit debug logs patch:
https://git.opendaylight.org/gerrit/#/c/55792/

Comment by Andrej Mak [ 24/Apr/17 ]

Attachment CRUD-debug-logs.zip has been added with description: CRUD-debug-logs

Comment by Andrej Mak [ 25/Apr/17 ]

https://git.opendaylight.org/gerrit/#/c/55909/

Comment by Andrej Mak [ 25/Apr/17 ]

Carbon:
https://git.opendaylight.org/gerrit/#/c/55981
Boron:
https://git.opendaylight.org/gerrit/#/c/55998

Comment by Vratko Polak [ 07/May/17 ]

The job is passing now [2].

[2] https://jenkins.opendaylight.org/releng/view/netconf/job/netconf-csit-3node-clustering-all-boron/504/

Generated at Wed Feb 07 20:14:55 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.