[NETCONF-397] Boron netconf cluster -all- job failures Created: 10/Apr/17 Updated: 15/Mar/19 Resolved: 07/May/17 |
|
| Status: | Verified |
| Project: | netconf |
| Component/s: | netconf |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Vratko Polak | Assignee: | Andrej Mak |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| Attachments: |
|
| External issue ID: | 8197 |
| Description |
|
The Connection to testtool is closed, this is seen in karaf.log: : Session is disconnected, failing RPC request <rpc message-id="m-2874" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"> The request itself look OK, investigation is under way. [0] https://jenkins.opendaylight.org/releng/view/netconf/job/netconf-csit-3node-clustering-all-boron |
| Comments |
| Comment by Vratko Polak [ 11/Apr/17 ] |
|
This Bug does not happen if USC and CAPWAP features are removed from odl-integration-compatible-with-all. More details to follow. |
| Comment by Vratko Polak [ 11/Apr/17 ] |
|
It is sufficient to remove CAPWAP. https://jenkins.opendaylight.org/sandbox/job/integration-multipatch-test-boron/6/ |
| Comment by Andrej Mak [ 13/Apr/17 ] |
|
It seems, that for some reason, slave mount points are registered about 30 s later than master mount point. Test checks "netconf-node-topology:connection-status" == "connected" condition. This attribute is set by master, when connection to device is established and master node mount point registered. However, slave nodes begin schema resolution and mount point registration only after setting "connected" by master. Node 1 - master: 2017-04-13 01:15:32,375 | INFO | ssing-executor-8 | NetconfDevice | 284 - org.opendaylight.netconf.sal-netconf-connector - 1.4.4.SNAPSHOT | RemoteDevice{netconf-test-device} : Netconf connector initialized successfully Node 2 slave: 2017-04-13 01:16:03,502 | INFO | lt-dispatcher-18 | SlaveSalFacade | 509 - org.opendaylight.netconf.topology-singleton - 1.1.4.SNAPSHOT | RemoteDevice{netconf-test-device} : Slave mount point registered. Node 3 slave: 2017-04-13 01:16:03,505 | INFO | lt-dispatcher-18 | SlaveSalFacade | 509 - org.opendaylight.netconf.topology-singleton - 1.1.4.SNAPSHOT | RemoteDevice{netconf-test-device} : Slave mount point registered. |
| Comment by Vratko Polak [ 19/Apr/17 ] |
|
CAPWAP features are no longer in odl-integration-compatible-with-all, but the job [0] still contains features. |
| Comment by Vratko Polak [ 19/Apr/17 ] |
|
> slave mount points are registered about 30 s later than master mount point. Confirmed. The interesting thing is this only happens in "CRUD" suite, which is executed second. It does not happen in (basically the same) Bug8086 suite (executed third), and it does not happen in somewhat similar Netconfready suite (executed first). Still no idea why slaves are late just in the second suite of The only suspicious thing visible in karaf.log at INFO level is the following (happens on device master): 2017-04-17 16:26:25,348 | INFO | ssing-executor-3 | NetconfDevice | 262 - org.opendaylight.netconf.sal-netconf-connector - 1.4.4.Boron-SR4 | RemoteDevice {netconf-test-device}: Netconf connector initialized successfully |
| Comment by Andrej Mak [ 24/Apr/17 ] |
|
From sandbox run with enhanced logging it seems, that those 30 s on slave node are spent to download yang sources from master during schema resolution. I will add additional logging to find exact cause of this delay. I've attached logs from CRUD suite with enhanced logging. Gerrit debug logs patch: |
| Comment by Andrej Mak [ 24/Apr/17 ] |
|
Attachment CRUD-debug-logs.zip has been added with description: CRUD-debug-logs |
| Comment by Andrej Mak [ 25/Apr/17 ] |
| Comment by Andrej Mak [ 25/Apr/17 ] |
|
Carbon: |
| Comment by Vratko Polak [ 07/May/17 ] |
|
The job is passing now [2]. |