[NETCONF-386] Cannot mount honeycomb in clustered odl on CentOS Created: 27/Mar/17  Updated: 15/Mar/19  Resolved: 07/Apr/17

Status: Resolved
Project: netconf
Component/s: netconf
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Juraj Linkeš Assignee: Jakub Morvay
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: Linux
Platform: PC


Attachments: Text File overcloud-controller-0-karaf-more-logs.log     Zip Archive overcloud-controller-0-karaf-try4.zip     Zip Archive overcloud-controller-1-karaf-more-logs.zip     Zip Archive overcloud-controller-1-karaf-try4.zip     Zip Archive overcloud-controller-2-karaf-more-logs.zip     Zip Archive overcloud-controller-2-karaf-try4.zip    
External issue ID: 8086

 Description   

Mounting fails with 2017-03-27 12:20:41,306 | DEBUG | lt-dispatcher-68 | AbstractSchemaRepository | 81 - org.opendaylight.yangtools.yang-model-util - 1.1.0.SNAPSHOT | Failed to acquire source from AbstractObjectRegistration

{instance=org.opendaylight.yangtools.yang.model.repo.spi.PotentialSchemaSource@5be51eb6}

akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka.tcp://opendaylight-cluster-data@192.168.10.20:2550/user/akka.tcp:opendaylight-cluster-data@192.168.10.20:2550_overcloud-controller-0.opnfv.org#-823207737]] after [20000 ms]. Sender[null] sent message of type "org.opendaylight.netconf.topology.singleton.messages.YangTextSchemaSourceRequest".
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:60

and

2017-03-27 12:20:41,306 | ERROR | lt-dispatcher-72 | Futures$CombinedFuture | 65 - com.google.guava - 18.0.0 | input future failed.
org.opendaylight.yangtools.yang.model.repo.api.MissingSchemaSourceException: All available providers exhausted



 Comments   
Comment by Juraj Linkeš [ 27/Mar/17 ]

Attachment overcloud-controller-0-karaf-try4.zip has been added with description: logs fron controller-0

Comment by Juraj Linkeš [ 27/Mar/17 ]

Attachment overcloud-controller-1-karaf-try4.zip has been added with description: logs fron controller-1

Comment by Juraj Linkeš [ 27/Mar/17 ]

Attachment overcloud-controller-2-karaf-try4.zip has been added with description: logs fron controller-2

Comment by Juraj Linkeš [ 28/Mar/17 ]

Attachment overcloud-controller-0-karaf-more-logs.log has been added with description: Logs from controller-0 after adding more logging into code

Comment by Juraj Linkeš [ 28/Mar/17 ]

Attachment overcloud-controller-1-karaf-more-logs.zip has been added with description: Logs from controller-1 after adding more logging into code

Comment by Juraj Linkeš [ 28/Mar/17 ]

Attachment overcloud-controller-2-karaf-more-logs.zip has been added with description: Logs from controller-2 after adding more logging into code

Comment by Jakub Morvay [ 28/Mar/17 ]

It seems, that the actual problem we are hitting here is that custom schema-cache-directory is not working correctly in clustered setting.

Mounting a netconf device in clustered setting goes as follows. Leader for netconf mounpoint is elected (this is done with help of clustering singleton service), leader mounts a device and notifies followers about mounting result. Part of the leader's mount process is device schema resolution.

Leader downloads all the schema sources from device, registers them with its schema registry and schema repository and finally builds schema context. After that, leader sends followers list of supported source ids and also creates actor that serves the sources upon followers' requests. It seems, that this actor is injected with wrong schema repository so it cannot serve followers requests and we see this failures in the log.

The bug should be present in a setting where we specify a custom schema-cache-directory for a netconf mount. Default setting shouldn't cause any problems.

Comment by Juraj Linkeš [ 28/Mar/17 ]

When using this configuration:
{
"topology": [
{
"node": [

{ "netconf-node-topology:actor-response-wait-time": 20, "netconf-node-topology:between-attempts-timeout-millis": 10000, "netconf-node-topology:connection-timeout-millis": 10000, "netconf-node-topology:default-request-timeout-millis": 10000, "netconf-node-topology:host": "192.168.10.25", "netconf-node-topology:keepalive-delay": 0, "netconf-node-topology:max-connection-attempts": 10, "netconf-node-topology:password": "admin", "netconf-node-topology:port": 2831, "netconf-node-topology:tcp-only": false, "netconf-node-topology:username": "admin", "node-id": "overcloud-controller-0.opnfv.org" }

],
"topology-id": "topology-netconf"
}
]
}

the errors disappeared. The errors were present only when I used a custom schema-cache-directory, e.g. "netconf-node-topology:schema-cache-directory": "hcmount"

Comment by Jakub Morvay [ 29/Mar/17 ]

https://git.opendaylight.org/gerrit/#/c/54017/

Comment by Vratko Polak [ 07/Apr/17 ]

A suite attempting to detect this Bug has been merged [0].
But I was unable to find a build where the suite fails.

The suite uses netconf-testtool and manipulates device data modeled by car.yang which is both exported by testtool and present in ODL-local /tmp/schema directory (on every cluster member).

Please, let me know if you find a build which fails on the suite, or if you spot a difference between the scenario the suite is testing and original Bug description.

[0] https://git.opendaylight.org/gerrit/54110

Generated at Wed Feb 07 20:14:53 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.